#pragma multi and rules with multiple outputs in GNU make

Recently we released ElectricAccelerator 6.2, which introduced a new bit of makefile syntax — #pragma multi — which allows you to indicate that a single rule produces multiple outputs. Although this is a relatively minor enhancement, I’m really excited about it because this it represents a new direction for emake development: instead of waiting for the GNU make project to add syntactic features and then following some time later with our emulation, we’re adding features that GNU make doesn’t have — and hopefully they will have to follow us for a change!

Unfortunately I haven’t done a good job articulating the value of #pragma multi. Unless you’re a pretty hardcore makefile developer, you probably look at this and think, “So what?” So let’s take a look at the problem that #pragma multi solves, and why #pragma multi matters.

Rules with multiple outputs in GNU make

The problem we set out to solve is simply stated: how can you specify to GNU make that one rule produces two or more output files? The obvious — but wrong — answer is the following:

1
2
foo bar: baz
touch foo bar

Unfortunately, this fragment is interpreted by GNU make as declaring two rules, one for foo and one for bar — it just so happens that the command for each rule creates both files. That will do more-or-less the right thing if you run a from-scratch, serial build:

$ gmake foo bar
touch foo bar
gmake: `bar' is up to date.

By the time GNU make goes to update bar, it’s already up-to-date thanks to the execution of the rule for foo. But look what happens when you run this same build in parallel:

$ gmake -j 2 foo bar
touch foo bar
touch foo bar

Oops! — the files were updated twice. No big deal in this trivial example, but it’s not hard to imagine a build where running the commands to update a file twice would produce bogus output, particularly if those updates could be happening simultaneously.

So what’s a makefile developer to do? In standard GNU make syntax, there’s only one truly correct way to create a rule with multiple outputs: pattern rules:

1
2
%.x %.y: %.in
touch $*.x $*.y

In contrast with explicit rules, GNU make interprets this fragment as declaring a single rule that produces two output files. Sounds perfect, but there’s a significant limitation to this solution: all of the output files must share a common sequence in the filenames (called the stem in GNU make parlance). That is, if your rule produces foo.x and foo.y, then pattern rules will work for you because the outputs both have foo in their names.

If your output files do not adhere to that naming limitation, then pattern rules can’t help you. In that case, you’re pretty much out of luck: there is no way to correctly indicate to GNU make that a single rule produces multiple output files. There are a variety of hacks you can try to coerce GNU make to behave properly, but each has its own limitations. The most common is to nominate one of the targets as the “primary”, and declare that the others depend on that target:

1
2
3
bar: foo
foo: baz
touch foo bar

Watch what happens when you run this build serially from scratch:

$ gmake foo bar
touch foo bar
gmake: Nothing to be done for `bar'.

Not bad, other than the odd “nothing to be done” message. At least the files weren’t generated twice. How about running it in parallel, from scratch?

$ gmake -j 2 foo bar
touch foo bar
gmake: Nothing to be done for `bar'.

Awesome! We still have the odd “nothing to be done” message, but just as in the serial build, the command was only invoked one time. Problem solved? Nope. What happens in an incremental build? If you’re lucky, GNU make happens to do the right thing and regenerate the files. But in one incremental build scenario, GNU make utterly fails to do the right thing. Check out what happens if the secondary output is deleted, but the primary is not:

$ rm -f bar && gmake foo bar
gmake: `foo' is up to date.
gmake: Nothing to be done for `bar'.

That’s right: GNU make failed to regenerate bar. If you’re very familiar with the build system, you might realize what had happened and think to either delete foo as well, or touch baz so that foo appears out-of-date (which would cause the next run to regenerate both outputs). But more likely at this point you just throw your hands up and do a full clean rebuild.

Note that all of the alternatives in vanilla GNU make have similar deficiencies. This kind of nonsense is why incremental builds have a bad reputation. This is why we created #pragma multi.

Rules with multiple outputs in Electric Make

By default Electric Make emulates GNU make, so it inherits all of GNU make’s limitations regarding rules with multiple outputs — with one critical exception. Even when running a build in parallel, Electric Make ensures that the output matches that produced by a serial GNU make build, which means that even the original, naive attempt will “work” for full builds regardless of whether the build is serial (single agent) or parallel (multiple agents).

Given that foundation, why did we bother with #pragma multi? There are a couple reasons:

  1. Correct incremental builds: with #pragma multi you can correctly articulate the relationships between inputs and outputs and thus ensure that all the outputs get rebuilt in incremental builds, rather than using kludges and hoping for the best.
  2. Out-of-the-box performance: although Electric Make guarantees correct output of the build, if you don’t have an up-to-date history file for the build you may waste time and compute resources running commands that don’t need to be run (work that will eventually be discarded when Electric Make detects the error). In the examples shown here the cost is negligible, but in real builds it could be significant.

Using #pragma multi is easy: just add the directive before the rule that will generate multiple outputs:

1
2
3
#pragma multi
foo bar: baz
touch foo bar

Watch what happens when this makefile is executed with Electric Make:

$ emake foo bar
touch foo bar

Note that there is no odd “is up to date” or “nothing to be done” message for bar — because Electric Make understands that both outputs are created by a single rule. Let’s verify that the build works as desired in the tricky incremental case that foiled GNU make — deleting bar without deleting foo:

$ rm -f bar && emake foo bar
touch foo bar

As expected, both outputs are regenerated: even though foo existed, bar did not, so the commands were executed.

Summary: rules with multiple outputs

Let’s do a quick review of the strategies for creating rules with multiple outputs. For simplicity we can group them into three categories:

  • #pragma multi
  • The naive approach, which does not actually create a single rule with multiple outputs at all.
  • Any of the various hacks for approximating rules with multiple outputs.

Here’s how each strategy fares across a variety of build modes:

Electric Make GNU make
Full (serial) Full (parallel) Incremental Full (serial) Full (parallel) Incremental
#pragma multi N/A
Naive
Hacks


The table paints a grim picture for GNU make: there is no way to implement rules with multiple outputs using standard GNU make which reliably gives both correct results and good performance across all types of builds. The naive approach generates the output files correctly in serial builds, but may fail in parallel builds. The various hacks work for full builds, but may fail in incremental builds. Even in cases where the output files are generated correctly, the build is marred by spurious “is up to date” or “nothing to be done for” messages — which is why most of the entries in the GNU make side are yellow rather than green.

In contrast, #pragma multi allows you to correctly generate multiple outputs from a single rule, for both full and incremental builds, in serial and in parallel. The naive approach also “works” with Electric Make, in that it will produce correct output files, but like GNU make the build is cluttered with spurious warnings. And, unless you have a good history file, the naive approach can trigger conflicts which may negatively impact build performance. Finally, despite its sophisticated conflict detection and correction smarts, even Electric Make cannot ensure correct incremental builds when you’ve implemented one of the multiple output hacks.

So there you have it. This is why we created #pragma multi: without it, there’s just no way to get the job done quickly and reliably. You should give ElectricAccelerator a try.

try_eade_button2

What’s new in ElectricAccelerator 6.2?

We released ElectricAccelerator 6.2 a couples weeks ago, our 25th feature release. 6.2 was a quick interim release primarily intended to address a couple long-standing stability issues, but we managed to squeeze in some really interesting feature enhancements as well. Here’s what’s new:

Rules with multiple outputs? Yeah, we can do that.

Every now and then, makefile authors need to write a single makefile rule that produces more than one output file, to accomodate tools that don’t fit gmake’s rigid one-command-one-output model. The classic example is bison, which produces both a C file and a header file from a single invocation of the tool.

Unfortunately in regular gmake the only way to write a rule with multiple outputs is to use a pattern rule. That’s great — if your needs happens to dovetail with the caveats and limitations of pattern rules (chiefly, that the output files share a common base name). If not, the answer has been basically that you’re out of luck. There are a variety of kludges that approximate the behavior, but despite numerous requests over the last decade (1, 2, 3, 4, 5, 6, 7, 8) and at least one patch implementing the feature, GNU make (as of 3.82) still has no way to create an explicit rule that produces multiple outputs.

When it comes to enhancements to the fundamental operation of GNU make, we’ve historically let the GNU make team take the lead, rather than risk introducing potentially incompatible changes. But after so many years it seems clear that this feature is not going to show up in GNU make — so we decided to forge ahead on our own. Enter #pragma multi:

1
2
3
#pragma multi
foo bar:
@touch foo bar

GNU make interprets this construct as two independent rules, one for foo and one for bar, which happen to each create both files. Thanks to the #pragma multi designation, Electric Make will interpret this as a single rule which produces both foo and bar. Using a #pragma to flag the rule is perfect, because it sidesteps any questions about syntax changes. And since #pragma starts with a #, GNU make will treat it as a comment, so this makefile will still be usable with GNU make — you’ll just get correct behavior and better performance with Electric Make.

New platforms and a faster installer

Accelerator 6.2 adds support for Linux kernels up to 3.5.x, which means that Accelerator now supports the following platforms:

  • Ubuntu 11.10
  • Ubuntu 12.04
  • SUSE Linux Enterprise Server 11 SP2

In addition, Accelerator 6.2 is expected to work correctly on both Ubuntu 12.10 and Windows 8, although we cannot officially claim support for those platforms since they were themselves not finalized at the time Accelerator 6.2 was released. This release also incorporates enhancements to the Linux installer which make the installation process about 25% faster compared to previous releases.

A complete list of platforms supported by ElectricAccelerator 6.2 can be found in the Electric Cloud Knowledge Base.

Key robustness improvements

Raise your hand if you’ve ever seen this error on your Linux Accelerator agent hosts:

unable to unmount EFS at “/some/path”: EBUSY

That error shows up sometimes when your build starts background processes — kind of a distributed build anti-pattern itself, but unfortunately it’s not always something you can control thanks to some third-party toolchains. Or rather, that error used to show up sometimes, because in Accelerator 6.2 we’ve bulletproofed the system against such rogue background processes, so that error is a thing of the past (nota bene: this enhancement is not available on Solaris).

In addition, we bulletproofed the system against external processes (any process running on an agent host which is not part of your build) accessing the EFS. In certain rare circumstances, such accesses could lead to agent host instability.

What’s next?

With 6.2 out the door we’ve finally got bandwidth to work on 7.0, which will focus on some very exciting performance improvements, especially for incremental builds. It’s a little bit too early to share any of the preliminary results we’re seeing, but rest assured — if you thought Accelerator was fast before, well… you ain’t seen nothing yet! Stay tuned for more information.

ElectricAccelerator 6.2 is available immediately. If you are already an Accelerator user, contact support@electric-cloud.com to upgrade. If you are not currently a user, you can download a free evaluation version of ElectricAccelerator Developer Edition, or contact sales@electric-cloud.com.

Fixing recursive make

Recursive make is one of those things that everybody loves to hate. It’s even been the subject of one of those tired “… Considered Harmful” diatribes. According to popular opinion, recursive make will sap performance from your build, make it nigh impossible to ensure correctness in parallel builds, and may render the user sterile. OK, maybe not that last one. But seriously, the arguments against recursive make are legion, and deeply entrenched. The problem? They’re flawed. That’s because they assume there’s only one way to implement recursive make — when the submake is invoked, the parent make is blocked until the submake completes. That’s how almost everybody does it. But in Electric Make, part of ElectricAccelerator, we developed a novel new approach called non-blocking recursive make. This design eliminates the biggest problems attributed to recursive make, without requiring a painful and costly conversion of your build system to non-recursive make.

The problem with traditional recursive make

There’s really just two problems at the heart of complaints with traditional recursive make: first, there’s no way to ensure correctness of a parallel recursive make based build without overserializing the submakes, because there’s no way to articulate dependencies between individual targets in different submakes. That means you can’t have a dependency graph that is both correct and precise. Instead you either leave out the critical dependency entirely, which makes parallel (ie, fast) builds unreliable; or you serialize submakes in their entirety, which shackles build performance because no part of a submake with even a single dependency on some portion of an earlier submake can begin until the entire ealier submake completes. Second, even if there were a way to specify precise dependencies between targets in different submakes, most versions of make have implemented recursive make such that the parent make is blocked from proceeding until the submake has completed. Consider a typical use of recursive make with implicit serializations between submakes:

1
2
3
4
all:
@for dir in util client server ; do \
$(MAKE) -C $$dir; \
done

Each submake compiles a bunch of source files, then links them together into a library (util) or an executable (client and server). The only actual dependency between the work in the three make instances is that the client and server programs need the util library. Everything else is parallelizable, but with traditional recursive make, gmake is unable to exploit that parallelism: all of the work in the util submake must finish before any part of the client submake begins!

Conflict detection and non-blocking recursive make

If you’re familiar with Electric Make, you already know how it solves the first half of the recursive make problem: conflict detection and correction. I’ve written about conflict detection before, but here’s a quick recap: using the explicit dependencies given in the makefiles and information about the files accessed as each target is built, emake is able to dynamically determine when targets have been built too early due to missing explicit dependencies, and rerun those targets to generate the correct output. Electric Make can ensure the correctness of parallel builds even in the face of incomplete dependencies, even if the missing dependencies are between targets in different submakes. That means you need not serialize entire submakes to ensure the build will run correctly in parallel.

Like an acrobat’s safety net, conflict detection allows us to consider solutions to the other half of the problem that would otherwise be considered risky, if not outright madness. In fact, our solution would not be possible without conflict detection: non-blocking recursive make. This is analogous to the difference between blocking and non-blocking I/O: rather than waiting for a recursive make to finish, emake carries on executing subsequent commands in the build immediately, including other recursive makes. Conflict detection ensures that only the commands in each submake which require serialization are executed sequentially, so the build runs as quickly as possible, but the final build output is identical to a serial build.

The impact of this change is dramatic. Here I’ve plotted the execution of the simple build defined above on four cores, using both gmake (normal recursive make) and emake (non-blocking recursive make):

Recursive make build with gmake


Recursive make build with emake

Electric Make is able to execute this build about 20% faster than gmake, with no changes to the Makefiles or the execution environment. emake is literally able to squeeze more parallelism out of recursive-make-based builds than gmake. In fact, we can precisely quantify just how much more parallelism emake gets through an application of Amdahl’s law. First, we compute the best possible speedup for the build — that’s just the serial runtime divided by the best possible parallel runtime, which we can figure out through analysis of the depedency graph and runtime of individual jobs in the build (the Longest Serial Chain report in ElectricInsight can do this for you). Then we can compute the parallelizable portion P of the build by plugging the speedup S into this equation: P = 1 – (1 / S). Here’s how that works out for gmake and emake:

gmake emake
Serial baseline 65s 65s
Best build time 13.5s 7.5s
Best speedup 4.8x 8.7x
Parallel portion 79% 89%

On this build, non-blocking recursive make increases the parallel portion of the build by 10%. That may not seem like much, but Amdahl’s law shows how dramatically that difference affects the speedup you can expect as you apply more cores:

Implementation

On the backend, non-blocking recursive make is handled by conflict detection — the jobs from the recursive make are checked for conflicts in the serial order defined by the makefile structure. Any issues caused by aggressively running recursive makes early are detected during the conflict check, and the target that ran too early is rerun to generate the correct result.

On the frontend, emake uses a strategy that is at once both brilliant in its simplicity, and diabolical in its trickery. It starts with an environment variable. When emake is invoked recursively, it checks the value of EMAKE_BUILD_MODE. If it is set to node, emake runs in so-called stub mode: rather than executing the submake (parsing the makefile and building targets), emake captures the invocation context (working directory, command-line and environment) in a file on disk, prints a “magic” string and exits with a zero status code.

The file containing the invocation context is identified by a second environment variable, ECLOUD_RECURSIVE_COMMAND_FILE. The Accelerator agent (which handles invoking commands on behalf of emake) checks for the presence of that file after every command that is run. If it is found, the agent relays the content to the toplevel emake invocation, where a new make instance is created to represent the submake invocation. That instance comes with it’s own parse job of course, which gets inserted into the queue of jobs. Some (short) time later, the parse job will run, discover whatever work must be run by the submake, and create additional rule jobs.

The magic string — EMAKE_FNORD — serves as a placeholder in the stdout stream for the jobs, so emake can figure out which portion of the output text comes before and which portion comes after the submake. This ensures that the build output log is identical to that generated by a serialized gmake build. For example, given the following rule that invokes a submake, you’d expect to see the “Before” and “After” messages printed before and after the output generated by commands in the submake itself:

1
2
3
4
all:
@echo Before util ; \
@$(MAKE) -C util ; \
@echo After util

With non-blocking recursive make, the submake has not actually executed when the “echo After util” command runs. If emake doesn’t account for that reordering, both the “Before” and “After” messages will appear before any of the output from the submake. EMAKE_FNORD allows emake to “stitch” the output together so the build log matches a serial log.

Limitations

Conflict detection and non-blocking recursive make together solve the main problems associated with recursive make. But there are a couple scenarios where non-blocking recursive make does not work well. Fortunately, these are uncommon in practice and easily addressed.

Capturing recursive make stdout

The first scenario is when the build captures the output of the recursive make invocation, rather than letting it print to stdout as normal. Since emake defers the execution of the submake and prints only EMAKE_FNORD to stdout, this will not work. There are two reasons you might do this: first, you might want to have separate build logs for each submake, to simplify error detection and management. In this situation, the simplest workaround is to remove the redirection and instead us emake’s annotated build log, an XML version of the build output log which can be easily processed using standard tools. Second, you may be using make as a text-processing tool (sort of a “poor man’s” Perl), rather than for building per se:

1
2
3
all:
@$(MAKE) -f genlist.mk > objects.txt
@cat objects.txt | xargs rm

In this case, the workaround is to explicitly force emake to run in so-called “local” mode, which means emake will handle the recursive make invocation as a blocking invocation, just like traditional make would. You can force emake into local mode by adding EMAKE_BUILD_MODE=local to the environment before the recursive make invocation.

Immediate consumption of build products

The second scenario is when the build consumes the product of the submake in the same command that contains the invocation. For example:

1
2
all:
@$(MAKE) -C sub foo && cp sub/foo ./foo

Here the build assumes that the output files generated by the submake will be available for use immediately after the submake completes. Obviously this is not the case with non-blocking recursive make — when the invocation of $(MAKE) -C sub foo completes, only the submake stub has actually finished. The build products will not be available until after the submake is actually processed later. Note that in this build both the recursive make invocation and the commands that use the build products from that invocation are treated as a single command from the perspective of make: make actually invokes the shell, and the shell then runs the recursive make and cp commands.

The workaround is simple: split the consumer into a distinct command, from the perspective of make:

1
2
3
all:
@$(MAKE) -C sub foo
@cp sub/foo ./foo

With that trivial change, emake is able to treat the cp as a continuation job, which can be serialized against the completion of the recursive make as needed.

A fix for recursive make

For years, people have heaped scorn and criticism on recursive make. They’ve nearly convinced everybody that even considering its use is automatically wrong — you probably can’t help feeling a little bit guilty when you use recursive make. But the reality is that recursive make is a reasonable way to structure a large build. You just need a better make. With conflict detection and non-blocking recursive make, Electric Make has fixed the problems usually associated with recursive make, so you can get parallel builds that are both fast and correct. Give it a try!

ElectricAccelerator and the Case of the Confounding Conflict

A user recently asked me why ElectricAccelerator reports a conflict in this simple build, when executed without a history file from a previous run:

1
2
3
4
5
6
7
all: foo symlink_to_foo
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo

Specifically, if you have at least two agents, emake will report a conflict between symlink_to_foo and foo, indicating that symlink_to_foo somehow read or otherwise accessed foo during execution! But ln does not access the target of a symlink when creating the symlink — in fact, you can even create a symlink to a non-existent file if you like. It seems obvious that there should be no conflict. What’s going on?

To understand why this conflict occurs, you have to wrap your head around two things. First, there’s more going on during a gmake-driven build than just the commands you see gmake invoke. That causes the usage that provokes the conflict. Second, emake considers a serial gmake build the “gold standard” — if a serial gmake build produces a particular result, so too must emake. That’s why the additional usage must result in a conflict.

In this case, the usage that triggers the conflict comes from management of the gmake stat cache. This is a gmake feature that was added to improve performance by avoiding redundant calls to stat() — once you’ve stat()‘d a file once, you don’t need to do it again. Unless the file is changed of course, which happens quite a lot during a build. To keep the stat cache up-to-date as the build progresses, gmake re-stat()‘s each target after it finishes running the commands for the target. So after the commands for symlink_to_foo complete, gmake stat()‘s symlink_to_foo again, using the standard stat() system call, which follows the symlink (in contrast to lstat(), which does not follow the symlink). That means gmake will actually cache the attributes of foo for symlink_to_foo.

To ensure compatibility with gmake, emake has to do the same. In Accelerator parlance, that means we get read usage on symlink_to_foo (because you have to read the symlink itself to determine the target of the symlink), and lookup usage on foo. The lookup on foo causes the conflict, because, of course, you will get a different result if you lookup foo before the job that creates it than you would get if you do the lookup after that job. Before the job, you’ll find that foo does not exist, obviously; after, you’ll find that it does.

But what difference does that make, really? In truth, if there’s no detectable difference in behavior, then it doesn’t matter at all. And in the example build there is no detectable difference — the build output is the same regardless of when exactly you stat() symlink_to_foo relative to when foo is created. But with a small modification to the build, it is suddenly becomes possible to see the impact:

1
2
3
4
5
6
7
8
9
10
all: foo symlink_to_foo reader
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo
reader: foo symlink_to_foo
@echo newer prereqs are: $?

Compare the output when this build is run serially with the output when the build is run in parallel — and note that I’m using gmake, so you can be certain I’m not trying to trick you with some peculiarity of emake’s implementation:

You can plainly see the difference: in the parallel build gmake stat()‘s symlink_to_foo before foo exists, so the stat cache records symlink_to_foo as non-existent. Then when gmake generates the value of $? for reader, symlink_to_foo is excluded, because non-existent files are never considered newer than existing files. In the serial build, gmake stat()‘s symlink_to_foo after foo has been created, so the stat cache indicates that symlink_to_foo exists and is newer than reader, so it is included in $?.

Hopefully you see now both what causes the conflict, and why it is necessary. The conflict occurs because of lookup usage generated when updating the stat cache. The conflict is necessary to ensure that the build output matches that produced by a serial gmake — the “gold standard” for build correctness. If no conflict is declared, there is the possibility for a detectable difference in build output compared to serial gmake.

However, you might be thinking that although it makes sense to treat this as a conflict in the general case, isn’t it possible to do something smarter in this specific case? After all, the orignal example build does not use $?, and without that there isn’t any detectable difference in the build output. So why not skip the conflict?

The answer is simple, if a bit disappointing. In theory it may be possible to elide the conflict by checking to see if the symlink is used by a later job in a manner that would produce a detectable difference (for example, by scanning the commands for subsequent targets for references to $?), but in reality the logistics of that check are daunting, and I’m not confident that we could guarantee correct behavior in all cases.

Fortunately all is not lost. If you wish to avoid this conflict, you have several options:

  1. Use a good history file from a previous build. This is the most obvious solution. You’ll only get conflicts if you run without a history file.
  2. Add an explicit dependency. If you make foo an explicit prereq of symlink_to_foo, then you will avoid the conflict. Here’s how that would look:
    1
    symlink_to_foo: foo
  3. Change the serial order. If you reorder the makefile so that symlink_to_foo has an earlier serial order than foo you will avoid the conflict. That just requires a reordering of the prereqs of all:
    1
    all: symlink_to_foo foo

Any one of these will eliminate the conflict from your build, and you’ll enjoy fast and correct parallel builds.

Case closed.

Makefile hacks: automatically split long command lines

If you’ve worked on a large build system you’ve probably bumped into this error, or one like this:

gmake: execvp: /bin/sh: Argument list too long

This error means the length of some command-line in your makefile has grown past the system limit, which is typically in the 32 to 256 kilobyte range. It’s surprisingly easy to hit that limit. You start with a small list of object files to be linked together. Over time you add more, and the command-line gets a little longer. Add a few more and it gets longer still. Before you know it you have a monster command-line and your build starts failing.

The solution to this problem is simple: split the long command-line into several shorter command-lines. For example, ar r libraries/lib.a objects/foo.o objects/bar.o objects/baz.o objects/boo.o objects/bang.o becomes something like this:

ar r libraries/lib.a objects/foo.o objects/bar.o
ar r libraries/lib.a objects/baz.o objects/boo.o
ar r libraries/lib.a objects/bang.o

Simple in theory, but tedious to do by hand. And doing it manually is like putting a ticking time-bomb into your makefile — it’s only a matter of time before your build grows enough that you have to go through this exercise again.

I recently ran across a clever solution that exploits the $(eval) function in GNU make to split long command-lines automatically, eliminating the tedium and the time-bomb. After I show you the solution, I’ll explain it piece-by-piece.

The max_args function

The solution is a user-defined function called max_args that splits long command-lines into equal-length chunks:

1
2
3
4
5
6
7
8
9
define max_args
$(eval _args:=)
$(foreach obj,$3,$(eval _args+=$(obj))$(if $(word $2,$(_args)),$1$(_args)$(EOL)$(eval _args:=)))
$(if $(_args),$1$(_args))
endef
define EOL
endef

And an example of its use:

1
2
3
OBJS:=a b c d e f g h
all:
@$(call max_args,echo,2,$(OBJS))

The max_args function takes three parameters: the base command-line, the number of arguments per “chunk”, and the complete list of arguments. It expands to a series of command-lines — one for each chunk of arguments.

The trick behind max_args is the use of $(eval) to update a variable as a side-effect of gmake’s regular variable expansion activity. If you’re not familiar with gmake variable expansion, here’s a quick rundown: when gmake finds a variable or function reference, like $(something), it replace the entire reference with an expanded value. In the case of a variable that’s just the value of the variable. Most variables in gmake are recursive which means that if the variable value itself contains embedded variable references, those will be expanded as well, recursively. In the case of a function, gmake evaluates the function, and replaces the reference with the computed value.

The meat of max_args is on line 3. It starts with the $(foreach) function, which evaluates its third argument, the body of the loop, once for each word in its second argument — in this case, the list of objects passed in the call to max_args.

In max_args, the loop body has two components. The first is a call to $(eval), which simply appends the current value of the loop variable to an accumulator called _args.

The second component of the loop body uses $(if) and $(word) to check the length of _args. The $(word) function returns the nth word from a list, or an empty string if there are fewer than n words in the list. The $(if) function expands its second argument (the then clause) only if its first argument (the condition) expands to a non-empty string, so together these functions check if _args has the desired number of words, and if so the then clause of the $(if) is expanded.

The then clause of this $(if) has two components. The first constructs a completed command-line by concatenating the base command-line, here given by $1, the first argument to the original max_args call; the accumulated arguments; and a newline character. Thanks to the rules of gmake expansion, this command-line is added to the overall expansion result for the max_args function. The second part of the then clause uses $(eval) to reset the accumulator

If the chunk size does not evenly divide the number of arguments, the stragglers are emitted in a final command-line on the last line of max_args.

Limitations

max_args is handy but it has one significant limitation: command-line length limits are based on the number of bytes in the command-line, not the number of words, in it. Unfortunately, gmake has no built-in way to count the number of characters in a string. gmake does provide the $(words) built-in, so that’s what max_args uses. That just means that to use it effectively you have to take a guess at the number of arguments that will fit in a single command-line, for example by dividing the length limit by the average number of characters in each argument, then subtracting some to allow some buffer for outliers.

How ElectricMake guarantees reliable parallel builds

Parallel execution is a popular technique for reducing software build length, and for good reason. These days, multi-core computers have become standard — even my laptop has four cores — so there’s horsepower to spare. And it’s “falling over easy” to implement: just slap a “-j” onto your make command-line, sit back and enjoy the benefits of a build that’s 2, 3 or 4 times faster than it used to be. Sounds great!

But then, inevitably, invariably, you run into parallel build problems: incomplete dependencies in your makefiles, tools that don’t adequately uniquify their temp file names, and any of a host of other things that introduce race conditions into your parallel build. Sometimes everything works great, and you get a nice, fast, correct build. Other times, your build blows up in spectacular fashion. And then there are the builds that appear to succeed, but in fact generate bogus outputs, because some command ran too early and used files generated in a previous instead of the current build.

This is precisely the problem ElectricMake was created to solve — it gives you fast, reliable parallel builds, regardless of how (im)perfect your makefiles and tools are. If the build works serially, it will work with ElectricMake, but faster. If you’ve worked with parallel builds for any length of time, you can probably appreciate the benefit of that guarantee.

But maybe you haven’t had much experience with parallel builds yourself, or maybe you have but like many people, you don’t believe this problem can actually be solved. In that case, perhaps some data will persuade you. Here’s a sample of open source projects that don’t build reliably in parallel using gmake:

For each, I did several trials with gmake at various levels of parallelism, to determine how frequently the parallel build fails. Then, I did the same build several times with emake and again measured the success rate. Here you can see the classic problem of parallel builds with gmake — works great at low levels of parallelism (or serially, the “degenerate” case of parallel!), but as you ratchet up the parallelism, the build gets less and less reliable. At the same time, you can see that emake is rock solid regardless of how much parallelism you use:

Parallel build success rates

The prize for this reliability? Faster builds, because you can safely exploit more parallelism. Where gmake becomes unreliable with -j 3 or -j 4, emake is reliable with any number of parallel jobs.

How ElectricMake guarantees reliable parallel builds

The technology that enables emake to ensure reliable parallel builds is called conflict detection. Although there are many nuances to its implementation, the concept is simple. First, track every modification to every file accessed by the build as a distinct version of the file. Then, for each job run during the build, track the files used and verify that the job accessed the same versions it would have had the build run serially. Any mismatch is considered a conflict. The offending job is discarded along with any filesystem modifications it made, and the job is rerun to obtain the correct result.

The versioned file system

At the heart of the conflict detection system is a data structure known as the versioned file system, in which emake records every version of every file used over the lifetime of the build. A version is added to the data structure every time a file is modified, whether that be a change to the content of the file, a change in the attributes (like ownership or access permissions), or the deletion of the file. In addition to recording file state, a version records the job which created it. For example, here’s what the version chain looks like for a file “foo” which initially does not exist, then is created by job A with contents “abc”, deleted by job C, and recreated by job E with contents “123”:

Jobs

Jobs are the basic unit of work in emake. A job represents all the commands that must be run in order to build a single makefile target. In addition, every job has a serial order — the order in which the job would have run, had the build been run serially. The serial order of a job is dictated by the dependencies and structure of the makefiles that make up the build. Note that for a given build, the serial order is deterministic and unambiguous — even if the dependencies are incomplete, there is exactly one order for the jobs when the build is run serially.

With the serial order for every job in hand, deciding which file version should be used by a given job is simple: just find the version created by the job with the greatest serial order that precedes the job accessing the file. For example, using the version chain above (and assuming that the jobs’ names reflect their serial order), job B should use the version created by job A, while job D should see the file as non-existent, thanks to the version created by job C.

A job enters the completed state once all of its commands have been executed. At that point, any filesystem updates created by the job are integrated into the versioned filesystem, but, critically, they are not pushed to the real filesystem — that gives emake the ability to discard the updates if the job is later found to have conflicts.

Each job runs against a virtual filesystem called the Electric File System (EFS), rather than the real filesystem. The EFS serves several important functions: first, it is the means by which emake tracks file accesses. Second, it enables commands in the build to access file versions that exist in the versioned filesystem, but not yet on the real filesystem. Finally, it isolates simultaneously running jobs from one another, eliminating the possibility of crosstalk between commands.

Detecting conflicts

With all the data emake collects — every version of every file, and the relationship between every job — the actual conflict check is simple: for each file accessed by a job, compare the actual version to the serial version. The actual version is the version that was actually used when the job ran; the serial version is the version that would have been used, if the build had been run serially. For example, consider a job B which attempts to access a file foo. At the time that B runs, the version chain for foo looks like this:

Given that state, B will use the initial version of foo — there is no other option. The initial version is therefore the actual version used by job B. Later, job A creates a new version of foo:

Since job A precedes job B in serial order, the version created by job A is the correct serial version for job B. Therefore, job B has a conflict.

If a job is determined to be free of conflicts, the job is committed, meaning any filesystem updates are at last applied to the real filesystem. Any job that has a conflict is reverted — all versions created by the job are marked invalid, so subsequent jobs will not use them. The conflict job is then rerun in order to generate the correct result. The rerun job is committed immediately upon completion.

Conflict checks are carried out by a dedicated thread which inspects each job in strict serial order. That guarantees that a job is not checked for conflicts until after every job that precedes it in serial order has been successfully verified free of conflicts — without this guarantee, we can’t be sure that we know the correct serial version for files accessed by the job. Similarly, this ensures that the rerun job, if any, will use the correct serial versions for all files — so the rerun job is sure to be conflict free.

ElectricMake: reliable parallel builds

Conceptually, conflict detection is simple — keep track of every version of every file used in a build, then verify that each job used the correct version — but there are many details to its implementation. And in this article I’ve only covered the most basic implementation of conflict detection — after many years of experience and thousands of real-world builds we’ve tweaked the implementation, relaxing the simple definition of a conflict in specific cases in order to improve performance.

The benefit of conflict detection is simple too: reliable parallel builds, which in turn means shorter build times, regardless of how imperfect your makefiles are and how parallel-unsafe your toolchain may be.

Why is SCons so slow?

UPDATE: If you’re coming from Why SCons is not slow, you should read my response

A while back, I did a series of posts exploring the performance of SCons on builds of various sizes. The results were dismal: SCons demonstrated a classic O(n2) growth in runtime, meaning that the length of the build grew in proportion to the square of the number of files in the build, rather than linearly as one would hope. Naturally, that investigation and its results provoked a great deal of discussion at the time and since. Typically, SCons advocates fall back on one particular argument: “Sure, SCons may be slow,” they say, “but that’s the price you pay for a correct build.” Recently, Eric S. Raymond wrote an article espousing this same fundamental argument, with the addition of some algorithmic analysis intended to prove mathematically that a correct build, regardless of the build tool, must necessarily exhibit O(n2) behavior — a clever bit of circular logic, because it implies that any build tool that does not have such abyssmal performance must not produce correct builds!

Naturally, after spending nearly a decade developing a high-performance replacement for GNU make, I couldn’t let that statement stand. This post is probably going to be on the long side, so here’s the tl;dr summary:

  • You can guarantee correct builds with make, provided you follow best practices.
  • The worst-case runtime of any build tool if, of course, O(n2), but most, if not all, builds can be handled in O(n) time, without sacrificing correctness.
  • SCons’ performance problem is caused by design and implementation decisions in SCons, not some pathology of build structure.

What is required to ensure a correct build?

One of the fundamental tenents of the pro-SCons mythos is the idea that it is unique in its ability to guarantee correct builds. In reality, SCons is not doing anything particularly special in this regard. It’s true that by virtue of its design SCons makes it easier to get it right, but there’s nothing keeping you from enjoying the same assurances in make.

First: what is a correct build? Simply put, a correct build is one in which everything that ought to be built, is built. Note that by definition, a from-scratch build is correct, since everything is built in that case. So the question of “correct” or “incorrect” is really only relevant in regards to incremental builds.

So, what do we need in order to ensure a correct incremental build? Only three things, actually:

  1. A single, full-build dependency graph.
  2. Complete dependency information for every generated file.
  3. A reliable way to determine if a file is up-to-date relative to its inputs.

What SCons has done is made it more-or-less impossible, by design, to not have these three things. There is no concept like recursive make in the SCons world, so the only option is a single, full-build dependency graph. Likewise, SCons automatically scans input files in several programming languages to find dependency information. Finally, SCons uses MD5 checksums for the up-to-date check, which is a pretty darn reliable way to verify whether a given file needs to be rebuilt.

But the truth is, you can guarantee correct builds with make — you just have to adhere to long-standing best practices for make. First, you have to avoid using recursive make. Then, you need to add automatic dependency generation. The only thing that’s a little tricky is the up-to-date check: make is hardwired to use file timestamps, which can be spoofed, deliberately or accidentally — although to be fair, in most cases, timestamps are perfectly adequate. But even here, there’s a way out. You can use a smarter version of make that has a more sophisticated up-to-date mechanism, like ElectricMake or ClearMake. You can even shoehorn MD5 checksums into GNU make, if you like.

I can’t deny that SCons has made it easier to get correct builds. But the notion that it can’t be done with make is simply absurd.

What is the cost of a correct build?

Now we turn to the question of the cost of ensuring correctness. At its core, any build tool is just a collection of graph algorithms — first contructing the dependency graph, then traversing it to find and update out-of-date files. These algorithms have well-understood complexity, typically given as O(n + e), where n is the number of nodes in the graph, and e is the number of edges. It turns out that e is actually the dominant factor here, since it is at least equal to n, and at worst as much as n2. That means we can simplify the complexity to O(n + n2), or just O(n2).

Does this absolve SCons of its performance sins? Unfortunately it does not, because O(n2) is a worst-case bound — you should only expect O(n2) behavior if you’ve got a build that has dependencies between every pair of files. Think about that for a second. A dependency between every. pair. of. files. Here’s what that would look like in makefile syntax:

all: foo bar foo.c bar.c foo.h bar.h
foo:     bar foo.c bar.c foo.h bar.h
bar:         foo.c bar.c foo.h bar.h
foo.c:             bar.c foo.h bar.h
bar.c:                   foo.h bar.h
foo.h:                         bar.h

It’s ridiculous, right? I don’t know about you, but I’ve certainly never seen a build that does anything even remotely like that. In particular, the builds I used in my benchmarks don’t look like that. Fortunately, those builds are small and simple enough that we can directly count the number of edges in the dependency graph. For example, the smallest build in my tests consisted of:

2,000 C sources
+ 2,004 headers
+ 2,000 objects
+ 101 libraries
+ 100 executables

6,205 total files

So we have about 6,000 nodes in the graph, but how many edges does the graph contain? Lucky for us, SCons will print the complete dependency graph if we invoke it with scons –tree=all:

+-.
  +-SConstruct
  +-d1_0
  | +-d1_0/SConstruct
  | +-d1_0/f00000_sconsbld_d1_0
  | | +-d1_0/f00000_sconsbld_d1_0.o
  | | | +-d1_0/f00000_sconsbld_d1_0.c
  | | | +-d1_0/lup001_sconsbld_d1_0/f00000_sconsbld_d1_0.h
  ...

The raw listing contains about 35,000 lines of text, but that includes duplicates and non-dependency information like filesystem structure. Filter that stuff out and you can see the graph contains only about 12,000 dependencies. That’s a far cry from the 1,800,000 or so you would expect if this truly were a “worst-case” build. It’s clear, in fact, that the number of edges is best described as O(n).

Although I don’t know how (or even if it’s possible) to prove that this is the general case, it does make a certain intuitive sense: far from being strongly-connected, most of the nodes in a build dependency graph have just one or two edges. Each C source file, for example, has just one outgoing edge, to the object file generated from that source. Each object file has just one outgoing edge too, to the library or executable the object is part of. Sure, libraries and headers probably have more edges, since they are used by multiple executables or objects, but the majority of the stuff in the graph is going to fall into the “small handful of edges” category.

Now, here’s the $64,000 question: if the algorithms in a build tool scale in proportion to the number of edges in the dependency graph, and we’ve just shown that the dependency graph in question has O(n) edges, why does SCons use O(n2) time to execute the build?

Why is SCons so slow?

SCons’ O(n2) performance stems from its graph traversal implementation. Essentially, SCons scans the entire dependency graph each time it is looking for a file to update. n scans of a graph with O(n) nodes and edges equals an O(n2) graph traversal. There’s no mystery here. In fact, the SCons developers are clearly aware of this deficiency, as described on their wiki:

It’s worth noting that the Jobs module calls the Taskmaster once for each node to be processed (i.e., it’s O(n)) and the Taskmaster has an amortized performance of O(n) each time it’s called. Thus, the overall time is O(n^2).

But despite recognizing this flaw, they severely misjudged its impact, because they go on to state that it requires a “pathological” dependency graph in order to elicit this worst-case behavior from SCons. As we’ve shown here and in previous posts, even a terribly mundane dependency graph elicits O(n2) behavior from SCons. I shudder to think what SCons would do with a truly pathological dependency graph!

Obviously the next question is: why does SCons do this? That’s not quite as easy for me to explain, as an outside observer. To the best of my understanding, they rescan the graph just in case new dependencies are added to the dependency graph while evaluating a node in the graph — remember, in SCons the commands to update a file are expressed in Python, so they can easily manipulate the dependency graph even while the build is running.

Is it really necessary to rescan the dependency graph over and over? I don’t think so. In fact, make is proof that it is not necessary. I think there are two ways that SCons could address this problem: first, it could adopt GNU make’s convention of partitioning the build into distinct phases, one that updates dependency information, and a second that actually executes the build. In GNU make, that strategy allows for the introduction of new dependency information, while imposing only a one-time O(n) cost for restarting the make process if any new dependencies are found.

Alternatively, SCons could probably be made smarter about when a full rescan is required. Most of the time, even if new dependencies are added to the graph, they are added to the node being evaluated, not to nodes that were already visited. That is, when you scan a source file for implicit dependencies, you find the dependencies for that file not for other files in the build (duh). So most of the time, a full rescan is massive overkill.

The final word…?

Hopefully this is my last post on the subject of SCons performance. It is clear to me that SCons does not scale to large projects, and that the problem stems from design and implementation decisions in SCons, rather than some pathology in the build itself. You can get comparable guarantees of correctness from make, if you’re willing to invest the time to do things the right way. The payoff is a build system that is not only correct but has vastly better performance than SCons as your project grows. Why wouldn’t you want that?