post

The Twelve Days of Christmas, GNU make style

Well, it’s Christmas Day in the States today, and while we’re all recovering from the gift-opening festivities, I thought this would be the perfect time for a bit of fun with GNU make. And what better subject matter than the classic Christmas carol “The Twelve Days of Christmas”? Its repetitive structure is perfect for demonstrating how to use several of GNU make’s built-in functions for iteration, selection and sorting. This simple makefile prints the complete lyrics to the song:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
L01=Twelve drummers drumming,
L02=Eleven pipers piping,
L03=Ten lords-a-leaping,
L04=Nine ladies dancing,
L05=Eight maids-a-milking,
L06=Seven swans-a-swimming,
L07=Six geese-a-laying,
L08=Five golden rings,
L09=Four calling birds,
L10=Three french hens,
L11=Two turtle doves, and
L12=A partridge in a pear tree!
LINES=12 11 10 09 08 07 06 05 04 03 02 01
DAYS=twelfth eleventh tenth ninth \
eighth seventh sixth fifth \
fourth third second first
$(foreach n,$(LINES),\
$(if $(X),$(info ),$(eval X=X))\
$(info On the $(word $n,$(DAYS)) day of Christmas,)\
$(info my true love gave to me)\
$(foreach line,$(wordlist $n,12,$(sort $(LINES))),\
$(info $(L$(line)))))
all: ; @:

By count, most of the lines here just declare variables, one for each item mentioned in the song. Note how the items are ordered: the last item added is given the lowest index. That means that to construct each verse we simply enumerate every item in the list, in order, starting with the new item in each verse.

Line 18 is where the real meat of the makefile begins. Here we use GNU make’s foreach function to iterate through the verses. $(foreach) takes three arguments: a name for the iteration variable, a space-separated list of words to assign to the iteration variable in turn, and a body of text to expand repeatedly, once for each word in the list. Here, the list of words is given by LINES, which lists the starting line for each verse, in order — that is, the first verse starts from line 12, the second from line 11, etc. The text to expand on each iteration is all the text on lines 19-23 of the makefile — note the use of backslashes to continue each line to the next.

Line 19 uses several functions to print a blank line before starting the next verse, if we’ve printed a verse already: the $(if) function, which expands its second argument if its first argument is non-empty, and its third argument if its first argument is empty; the $(info) function to print a blank line; and the $(eval) function to set the flag variable. The first time this line is expanded, X does not exist, so it expands to an empty string and the $(if) picks the “else” branch. After that, X has a value, so the $(if) picks the “then” branch.

Lines 20 and 21 again use $(info) to print output — this time the prelude for the verse, like “On the first day of Christmas, my true love gave to me”. The ordinal for each day is pulled from DAYS using the $(word) function, which extracts a specified word, given by its first argument, from the space-separated list given as its second argument. Here we’re using n, the iteration variable from our initial $(foreach) as the selector for $(word).

Line 22 uses $(foreach) again, this time to iterate through the lines in the current verse. We use line as the iteration variable. The list of words is given again by LINES except now we’re using $(sort) to reverse the order, and $(wordlist) to select a subset of the lines. $(wordlist) takes three arguments: the index of the first word in the list to select, the index of the last word to select, and a space-separated list of words to select from. The indices are one-based, not zero-based, and $(wordlist) returns all the words in the given range. The body of this $(foreach) is just line 23, which uses $(info) once more to print the current line of the current verse.

Line 25 has the last bit of funny business in this makefile. We have to include a make rule in the makefile, or GNU make will complain *** No targets. Stop. after printing the lyrics. If we simply declare a rule with no commands, like all:, GNU make will complain Nothing to be done for `all’.. Therefore, we define a rule with a single “no-op” command that uses the bash built-in “:” to do nothing, combined with GNU make’s @ prefix to suppress printing the command itself.

And that’s it! Now you’ve got some experience with several of the built-in functions in GNU make — not bad for a Christmas day lark:

  • $(eval) for dynamic interpretation of text as makefile content
  • $(foreach), for iteration
  • $(if), for conditional expansion
  • $(info), for printing output
  • $(sort), for sorting a list
  • $(word), for selecting a single word from a list
  • $(wordlist), for selecting a range of words from a list

Now — where’s that figgy pudding? Merry Christmas!

post

UPDATE: SCons is Still Really Slow

A while back I posted a series of articles exploring the scalability of SCons, a popular Python-based build tool. In a nutshell, my experiments showed that SCons exhibits roughly quadratic growth in build runtimes as the number of targets increases:

Recently Dirk Baechle attempted to rebut my findings in an entry on the SCons wiki: Why SCons is not slow. I thought Dirk made some credible suggestions that could explain my results, and he did some smart things in his effort to invalidate my results. Unfortunately, his methods were flawed and his conclusions are invalid. My original results still stand: SCons really is slow. In the sections that follow I’ll share my own updated benchmarks and show where Dirk’s analysis went wrong.

Test setup

As before, I used genscons.pl to generate sample builds ranging from 2,000 to 50,000 targets. However, my test system was much beefier this time:

2013 2010
OS Linux Mint 14 (kernel version 3.5.0-17-generic) RedHat Desktop 3 (kernel version 2.4.21-58.ELsmp)
CPU Quad 1.7GHz Intel Core i7, hyperthreaded Dual 2.4GHz Intel Xeon, hyperthreaded
RAM 16 GB 2 GB
HD SSD (unknown)
SCons 2.3.0 1.2.0.r3842
Python 2.7.3 (system default) 2.6.2

Before running the tests, I rebooted the system to ensure there were no rogue processes consuming memory or CPU. I also forced the CPU cores into “performance” mode to ensure that they ran at their full 1.7GHz speed, rather than at the lower 933MHz they switch to when idle.

Revisiting the original benchmark

I think Dirk had two credible theories to explain the results I obtained in my original tests. First, Dirk wondered if those results may have been the result of virtual memory swapping — my original test system had relatively little RAM, and SCons itself uses a lot of memory. It’s plausible that physical memory was exhausted, forcing the OS to swap memory to disk. As Dirk said, “this would explain the increase of build times” — you bet it would! I don’t remember seeing any indication of memory swapping when I ran these tests originally, but to be honest it was nearly 4 years ago and perhaps my memory is not reliable. To eliminate this possibility, I ran the tests on a system with 16 GB RAM this time. During the tests I ran vmstat 5, which collects memory and swap usage information at five second intervals, and captured the result in a log.

Next, he suggested that I skewed the results by directing SCons to inherit the ambient environment, rather than using SCons’ default “sanitized” environment. That is, he felt I should have used env = Environment() rather than env = Environment(ENV = os.environ). To ensure that this was not a factor, I modified the tests so that they did not inherit the environment. At the same time, I substituted echo for the compiler and other commands, in order to make the tests faster. Besides, I’m not interested in benchmarking the compiler — just SCons! Here’s what my Environment declaration looks like now:

env = Environment(CC = 'echo', AR = 'echo', RANLIB = 'echo')

With these changes in place I reran my benchmarks. As expected, there was no change in the outcome. There is no doubt: SCons does not scale linearly. Instead the growth is polynomial, following an n1.85 curve. And thanks to the the vmstat output we can be certain that there was absolutely no swapping affecting the benchmarks. Here’s a graph of the results, including an n1.85 curve for comparison — notice that you can barely see that curve because it matches the observed data so well!

SCons full build runtime - click for larger view

For comparison, I used the SCons build log to make a shell script that executes the same series of echo commands. At 50,000 targets, the shell script ran in 1.097s. You read that right: 1.097s. Granted, the shell script doesn’t do stuff like up-to-date checks, etc., but still — of the 3,759s average SCons runtime, 3,758s — 99.97% — is SCons overhead.

I also created a non-recursive Makefile that “builds” the same targets with the same echo commands. This is a more realistic comparison to SCons — after all, nobody would dream of actually controlling a build with a straight-line shell script, but lots of people would use GNU make to do it. With 50,000 targets, GNU make ran for 82.469s — more than 45 times faster than SCons.

What is linear scaling?

If the performance problems are so obvious, why did Dirk fail to see them? Here’s a graph made from his test results:

SCons full build runtime, via D. Baechle - click for full size

Dirk says that this demonstrates “SCons’ linear scaling”. I find this statement baffling, because his data clearly shows that SCons does not scale linearly. It’s simple, really: linear scaling just means that the build time increases by the same amount for each new target you add, regardless of how many targets you already have. Put another way, it means that the difference in build time between 1,000 targets and 2,000 targets is exactly the same as the difference between 10,000 and 11,000 targets, or between 30,000 and 31,000 targets. Or, put yet another way, it means that when you plot the build time versus the number of targets, you should get a straight line with no change in slope at any point. Now you tell me: does that describe Dirk’s graph?

Here’s another version of that graph, this time augmented with a couple additional lines that show what the plot would look like if SCons were truly scaling linearly. The first projection is based on the original graph from 2,500 to 4,500 targets — that is, if we assume that SCons scales linearly and that the increase in build time between 2,500 and 4,500 targets is representative of the cost to add 2,000 more targets, then this line shows us how we should expect the build time to increase. Similarly, the second projection is based on the original graph between 4,500 and 8,500 targets. You can easily see that the actual data does not match either projection. Furthermore you can see that the slope of these projections is increasing:

SCons full build runtime with linear projections, via D. Baechle - click for full size

This shows the importance of testing at large scale when you’re trying to characterize the scalability of a system from empirical data. It can be difficult to differentiate polynomial from logarithmic or linear at low scales, especially once you incorporate the constant factors — polynomial algorithms can sometimes even give better absolute performance for small inputs than linear algorithms! It’s not until you plot enough data points at large enough values, as I’ve done, that it becomes easy to see and identify the curve.

What does profiling tell us?

Next, Dirk reran some of his tests under a profiler, on the very reasonable assumption that if there was a performance problem to be found, it would manifest in the profiling data — surely at least one function would demonstrate a larger-than-expected growth in runtime. Dirk only shared profiling data for two runs, both incremental builds, at 8,500 and 16,500 targets. That’s unfortunate for a couple reasons. First, the performance problem is less apparent on incremental builds than on full builds. Second, with only two datapoints it is literally not possible to determine whether growth is linear or polynomial. The results of Dirk’s profiling was negative: he found no “significant difference or increase” in any function.

Fortunately it’s easy to run this experiment myself. Dirk used cProfile, which is built-in to Python. To profile a Python script you can inject cProfile from the command-line, like this: python -m cProfile scons. Just before Python exits, cProfile dumps timing data for every function invoked during the run. I ran several full builds with the profiler enabled, from 2,000 to 20,000 targets. Then I sorted the profiling data by function internal time (time spent in the function exclusively, not in its descendents). In every run, the same two functions appeared at the top of the list: posix.waitpid and posix.fork. To be honest this was a surprise to me — previously I believed the problem was in SCons’ Taskmaster implementation. But I can’t really argue with the data. It makes sense that SCons would spend most of its time running and waiting for child processes to execute, and even that the amount of time spent in these functions would increase as the number of child processes increases. But look at the growth in runtimes in these two functions:

SCons full build function time, top two functions - click for full size

Like the overall build time, these curves are obviously non-linear. Armed with this knowledge, I went back to Dirk’s profiling data. To my surprise, posix.waitpid and posix.fork don’t even appear in Dirk’s data. On closer inspection, his data seems to include only a subset of all functions — about 600 functions, whereas my profiling data contains more than 1,500. I cannot explain this — perhaps Dirk filtered the results to exclude functions that are part of the Python library, assuming that the problem must be in SCons’ own code rather than in the library on which it is built.

This demonstrates a second fundamental principle of performance analysis: make sure that you consider all the data. Programmers’ intuition about performance problems is notoriously bad — even mine! — which is why it’s important to measure before acting. But measuring won’t help if you’re missing critical data or if you discard part of the data before doing any analysis.

Conclusions

On the surface, performance analysis seems like it should be simple: start a timer, run some code, stop the timer. Done correctly, performance analysis can illuminate the dark corners of your application’s performance. Done incorrectly — and there are many ways to do it incorrectly — it can lead you on a wild goose chase and cause you to squander resources fixing the wrong problems.

Dirk Baechle had good intentions when he set out to analyze SCons performance, but he made some mistakes in his process that led him to an erroneous conclusion. First, he didn’t run enough large-scale tests to really see the performance problem. Second, he filtered his experimental data in a way that obscured the existence of the problem. But perhaps his worst mistake was to start with a conclusion — that there is no performance problem — and then look for data to support it, rather than starting with the data and letting it impartially guide him to an evidence-based conclusion.

To me the evidence seems indisputable: SCons exhibits roughly quadratic growth in runtimes as the number of build targets increases, rendering it unusable for large-scale software development (tens of thousands of build outputs). There is no evidence that this is a result of virtual memory swapping. Profiling suggests a possible pair of culprits in posix.waitpid and posix.fork. I leave it to Dirk and the SCons team to investigate further; in the meantime, you can find my test harness and test results in my GitHub repo. If you can see a flaw in my methodology, sound off in the comments!

post

What’s new in GNU make 4.0?

After a little bit more than three years, the 4.0 release of GNU make finally arrived in October. This release packs in a bunch of improvements across many functional areas including debuggability and extensibility. Here’s my take on the most interesting new features.

Output synchronization

For the majority of users the most exciting new feature is output synchronization. When enabled, output synchronization ensures that the output of each job is kept distinct, even when the build is run in parallel. This is a tremendous boon to anybody that’s had the misfortune of having to diagnose a failure in a parallel build. This simple Makefile will help demonstrate the feature:

1
2
3
4
5
6
7
8
9
10
11
12
all: a b c
a:
@echo COMPILE a
@sleep 1 && echo a, part 1
@sleep 1 && echo a, part 2
@sleep 2 && echo a, part 3
b c:
@echo COMPILE $@
@sleep 1 && echo $@, part 1
@sleep 1 && echo $@, part 2
@sleep 1 && echo $@, part 3

Now compare the output when run serially, when run in parallel, and when run in parallel with –output-sync=target:

Serial Parallel Parallel with –output-sync=target
$ gmake
COMPILE a
a, part 1
a, part 2
a, part 3
COMPILE b
b, part 1
b, part 2
b, part 3
COMPILE c
c, part 1
c, part 2
c, part 3
$ gmake -j 4
COMPILE a
COMPILE b
COMPILE c
b, part 1
a, part 1
c, part 1
b, part 2
a, part 2
c, part 2
b, part 3
c, part 3
a, part 3
$ gmake -j 4 --output-sync=target
COMPILE c
c, part 1
c, part 2
c, part 3
COMPILE b
b, part 1
b, part 2
b, part 3
COMPILE a
a, part 1
a, part 2
a, part 3

Here you see the classic problem with parallel gmake build output logs: the output from each target is mixed up with the output from other targets. With output synchronization, the output from each target is kept separate, not intermingled. Slick! The output doesn’t match that of the serial build, unfortunately, but this is still a huge step forward in usability.

The provenance of this feature is especially interesting, because the idea can be traced directly back to me — in 2009, I wrote an article for CM Crossroads called Descrambling Parallel Build Logs. That article inspired David Boyce to submit a patch to GNU make in 2011 which was the first iteration of the –output-sync feature.

GNU Guile integration

The next major addition in GNU make 4.0 is GNU Guile integration, which makes it possible to invoke Guile code directly from within a makefile, via a new $(guile) built-in function. Naturally, since Guile is a general-purpose, high-level programming language, this allows for far more sophisticated computation from directly within your makefiles. Here’s an example that uses Guile to compute Fibonacci numbers — contrast with my Fibonacci in pure GNU make:

1
2
3
4
5
6
7
8
9
10
11
define FIBDEF
(define (fibonacci x)
(if (< x 2)
x
(+ (fibonacci (- x 1)) (fibonacci (- x 2)))))
#f
endef
$(guile $(FIBDEF))
%:
@echo $(guile (fibonacci $@))

Obviously, having a more expressive programming language available in makefiles will make it possible to do a great deal more with your make-based builds than ever before. Unfortunately I think the GNU make maintainers made a couple mistakes with this feature which will limit its use in practice. First, Guile was a poor choice. Although it’s a perfectly capable programming language, it’s not well-known or in wide use compared to other languages that they might have chosen — although you can find Scheme on the TIOBE Index, Guile itself doesn’t show up, and even though it is the official extension language of the GNU project, fewer than 25 of the GNU project’s 350 packages use Guile. If the intent was to embed a language that would be usable by a large number of developers, Python seems like the no-brainer option. Barring that for any reason, Lua seems to be the de facto standard for embedded programming languages thanks to its small footprint and short learning curve. Guile is just some weird also-ran.

Second, the make/Guile integration seem a bit rough. The difficulty arises from the fact that Guile has a rich type system, while make does not — everything in make is a string. Consequently, to return values from Guile code to make they must be converted to a string representation. For many data types — numbers, symbols and of course strings themselves — the conversion is obvious, and reversible. But for some data types, this integration does a lossy conversion which makes it impossible to recover the original value. Specifically, the Guile value for false, #f, is converted to an empty string, rendering it indistinguishable from an actual empty string return value. In addition, nested lists are flattened, so that (a b (c d) e) becomes a b c d e. Of course, depending on how you intend to use the data, each of these may be the right conversion. But that choice should be left to the user, so that we can retain the additional information if desired.

Loadable objects

The last big new feature in GNU make 4.0 is the ability to dynamically load binary objects into GNU make at runtime. In a nutshell, that load of jargon means that it’s possible for you to add your own “built-in” functions to GNU make, without having to modify and recompile GNU make itself. For example, you might implement an $(md5sum) function to compute a checksum, rather than using $(shell md5sum). Since these functions are written in C/C++ they should have excellent performance, and of course they can access the full spectrum of system facilities — file I/O, sockets, pipes, even other third-party libraries. Here’s a simple extension that creates a $(fibonacci) built-in function:

#include <stdio.h>
#include <gnumake.h>

int plugin_is_GPL_compatible;

int fibonacci(int n)
{
    if (n < 2) {
        return n;
    }
    return fibonacci(n - 1) + fibonacci(n - 2);
}

char *gm_fibonacci(const char *nm, unsigned int argc, char **argv)
{
    char *buf  = gmk_alloc(33);
    snprintf(buf, 32, "%d", fibonacci(atoi(argv[0])));
    return buf;
}

int fibonacci_gmk_setup ()
{
    gmk_add_function ("fibonacci", gm_fibonacci, 1, 1, 0);
    return 1;
}

And here’s how you would use it in a makefile:

1
2
3
load ./fibonacci.so
%:
@echo $(fibonacci $@)

I’m really excited about this feature. People have been asking for additional built-in functions for years — to handle arithmetic, file I/O, and other tasks — but for whatever reason the maintainers have been slow to respond. In theory, loadable modules will enable people to expand the set of built-in functions without requiring the approval or involvement of the core team. That’s great! I only wish that the maintainers had been more responsive when we invited them to collaborate on the design, so we might have come up with a design that would work with both GNU make and Electric Make, so that extension authors need only write one version of their code. Ah well — que sera, sera.

Other features

In addition to the major feature described above there are several other enhancements worth mentioning here:

  • ::= assignment, equivalent to := assignment, added for POSIX compatibility.
  • != assignment, which is basically a substitute for $(shell), added for BSD compatibility.
  • –trace command-line option, which causes GNU make to print commnds before execution, even if they would normally be suppressed by the @ prefix.
  • $(file …) built-in function, for writing text to a file.
  • GNU make development migrated from CVS to git.

You can find the full list of updates in the NEWS file in the GNU make source tree.

Looking ahead

It’s great to see continued innovation in GNU make. Remember, this is a tool that’s now 25 years old. How much of the software you wrote 25 years ago is still in use and still in active development? I’d like to offer a heartfelt congratulations to Paul Smith and the rest of the GNU make team for their accomplishments. I look forward to seeing what comes next!

post

What’s new in ElectricAccelerator 7.1

ElectricAccelerator 7.1 hit the streets a last month, on October 10, just six months after the 7.0 release in April. There are some really cool new features in this release, which picks up right where 7.0 left off by adding even more ground-breaking performance features: schedule optimization and Javadoc caching. Here’s a quick look at each.

Schedule Optimization

The idea behind schedule optimization is really simple: we can reduce overall build duration if we’re smarter about the order in which jobs are run. In essense, it’s about packing the jobs in tighter, eliminating idle time in the middle of the build and reducing the “ragged right edge”. Here’s a side-by-side comparison of the same build, first using normal scheduling and then using schedule optimization. You can easily see that schedule optimization made the second build faster — an 11% improvement in this small, real-world example:

Build using naive scheduling -- click to view full size

Build using naive scheduling — click to view full size

Build using schedule optimization - click to view full size

Build using schedule optimization – click to view full size

If you study the two runs more closely, you can see how schedule optimization produced this improvement: key jobs, in particular the longest jobs, were started earlier. As a result, idle time in the middle of the build was reduced or eliminated entirely, and the right edge is much less uneven. But the best part? It’s completely automatic: all you have to do is run the build once for emake to learn its performance profile. Every subsequent build will leverage that data to improve build performance, almost like magic.

Not convinced? Here’s a look at the impact of schedule optimization on another, much bigger proprietary build (serial build time 18h25m). The build is already highly parallelizable and achieves an impressive 37.2x speedup with 48 agents — but schedule optimization can reduce the build duration by nearly 25% more, bringing to total speedup on 48 agents to an eye-popping 47.5x!

Build duration with naive and optimized scheduling

Build duration with naive and optimized scheduling

There’s another interesting angle to schedule optimization though. Most people will take the performance gains and use them to get a faster build on the same hardware. But you could go the other direction just as easily — keep the same build duration, but do it with dramatically less hardware. The following graph quantifies the savings, in terms of cores needed to achieve a particular build duration. Suppose we set a target build duration of 30 minutes. With naive scheduling, we’d need 48 agents to meet that target. With schedule optimization, we need only 38.

Resource requirements with naive and optimized scheduling - click for full size

Resource requirements with naive and optimized scheduling – click for full size

I’m really excited about schedule optimization, because it’s one of those rare features that give you something for nothing. It’s also been a long time coming — the idea was originally conceived of over three years ago, and it’s only now that we were able to bring it to fruition.

Schedule optimization works with emake on all supported platforms, with all emulation modes. It is not currently available for use with electrify.

Javadoc caching

The second major feature in Accelerator 7.1 is Javadoc caching. Again, it’s a simple idea: think “ccache”, but for Javadoc instead of compiles. This is the next phase in the evolution of Accelerator’s output reuse initiative, which began in the 7.0 release with parse avoidance. Like any output reuse feature, Javadoc caching works by capturing the product of a Javadoc invocation and storing it in a cache indexed by a hash of the inputs used — including the Java files themselves, the environment variables, and the command-line. In subsequent builds, emake will check those inputs again and if it computes the same hash, emake will used the cached results instead of running Javadoc again. On big Javadoc jobs, this can produce significant savings. For example, in the Android “Jelly Bean” open-source build, the main Javadoc invocation usually takes about five minutes. With Javadoc caching in Accelerator 7.1, that job runs in only about one minute — an 80% reduction! In turn that gives us a full one minute reduction in the overall build time, dropping the build from 13 minutes to 12 — nearly a 10% improvement:

Uncached Javadoc job in Android build - click for full image

Uncached Javadoc job in Android build – click for full image

Cached Javadoc job in Android build - click for full build

Cached Javadoc job in Android build – click for full image

Javadoc caching is available on Solaris and Linux only in Accelerator 7.1.

Looking ahead

I hope you’re as excited about Accelerator 7.1 as I am — for the second time this year, we’re bringing revolutionary new performance features to the table. But of course our work is never done. We’ve been hard at work on the “buddy cluster” concept for the next release of Accelerator. Hopefully I’ll be able to share some screenshots of that here before the end of the year. We’re also exploring acceleration for Bitbake builds like the Yocto Project. And last, but certainly not least, we’ll soon start fleshing out the next phase of output reuse in Accelerator — caching compiler invocations. Stay tuned!

post

The inverted parallel build bug

At some point most of you have encountered “the” parallel build problem: a build that works just fine when run serially, but breaks sometimes when run in parallel. You may have read my blog about how ElectricAccelerator automatically solves the classic parallel build problem. Recently I ran into the opposite problem in a customer’s build: a build that “works” when run in parallel, but breaks when run serially! If you’re lucky, this build defect will just cause occasional build failures. If you’re unlucky, it will silently corrupt your build output at random. With traditional GNU make this nasty bug is a nightmare to track down — if you even know that its present!

In contrast, the unique features in ElectricAccelerator make it trivial to find the defect — some might even say it’s fun (well, if you’re like me and you enjoy using powerful tools to do sophisticated analysis without breaking a sweat!). Read on to see how ElectricAccelerator makes it easy to diagnose and fix bugs in your build.

The inverted parallel build bug

Let’s start with a concrete example. Here’s a simple Makefile which (appears to) work when run in parallel, but which consistently fails serially:

1
2
3
4
5
6
7
8
all: reader writer
reader:
sleep 2
cat output
writer:
echo PASS > output

Assuming that output does not exist, executing this makefile serially will always produce an error:

$ gmake
sleep 2
cat output
cat: output: No such file or directory
gmake: *** [reader] Error 1

But if you execute this makefile in parallel, it appears to work!:

$ gmake -j 2
sleep 2
echo PASS > output
cat output
PASS

If we visualize the execution of these commands it’s easy to see why the parallel build seems to work:

Sample parallel execution timeline

At the beginning of the build, both reader and writer are started, more-or-less at the same time, because we told gmake to run two jobs at a time. reader has two commands, which are executed serially according to the semantics of make. While the sleep 2 is executing, the echo command in writer runs and completes. When the cat command in reader starts, it succeeds because output is ready-to-go.

Parallel execution is no guarantee

Some people will look at that explanation and think “Got it — always run this thing in parallel and we’re good!” Of course, you can’t really be 100% sure that everybody will remember to run the makefile in parallel. But even if you could, there’s a flaw in that reasoning: basically, your build has a race condition, and there’s no guarantee that you’ll “win” the race every time. For example, if your build server is heavily loaded, the sequence of events might look like this instead:

Alternative parallel execution timeline

Here, writer doesn’t get started until after the sleep command has finished — too late to save the cat command from failure.

Build failure is not the worst outcome

Before we move on to finding and fixing problems like this, let’s take a quick look at one more failure mode: incremental builds. In particular, check out what happens if output exists before the build starts, but with incorrect content (for example, stale data from an earlier build):

$ echo '*** FAIL ***' > output
$ gmake
sleep 2
cat output
*** FAIL ***
echo PASS > output
$ echo $?
0

That’s right — the build “succeeded”, because it produced no error messages and exited with a zero exit code. And yet, it produced completely bogus output. Ouch!

Somebody save me!

If you’re using ordinary GNU make, you’re in for a world of hurt with a problem like this. First, the only way to consistently reproduce the problem is to run the entire build serially — of course that probably takes a long time, or you wouldn’t have been using parallel builds in the first place. Second, there are no diagnostics built into gmake that could help you identify which job produces output. One option is to use strace to monitor filesystem accesses, but that will generate a mountain of data in a not-very-usable format. Plus, it imposes a substantial performance penalty — on top of the hit you’d already take for running the build serially. Yuck!

If you’re using Electric Make, this problem is embarrassingly easy to solve thanks to emake’s core features:

  • Consistent results: emake mimics serial execution with gmake, so you’ll always get a consistent result with this build. That means it will fail, the same way, every time, which means you’ll discover the problem immediately after it is introduced, not months or years later after it has become nearly impossible to tell which Makefile change introduced the defect.
  • Parallel speed: emake’s results match those of a serial gmake build, but its performance is more like that of a parallel gmake build — better, in most cases.
  • Annotated build logs: emake can generate an XML-enhanced version of the build output log which contains a record of every file accessed by every job in the build. This annotation file can easily be mined to identify pairs of jobs where the reader preceeds the writer.

You can use any general purpose XML parsing library to read annotation files, but it’s easy to use annolib, the high-performance annotation processing library we wrote to facilitate this kind of analysis. Since annolib is built into ElectricInsight, the easiest way to use it is to write the analysis as a custom Insight report. All you need to do is iterate through the files referenced in the build, looking for read operations (or, in this case, failed lookups) preceeding a write operation. Here’s the code:

global anno
set instances [list]

# Iterate over the files referenced in the build...

foreach filename [$anno files] {
    set readers [list]

    # Iterate over the operations performed on the file...

    foreach tuple [$anno file operations $filename] {
        foreach {job op dummy} $tuple { break }
        if { $op == "read" || $op == "failedlookup" } {
            # If this is a read operation, note the job that did the read.

            lappend readers $job
        } elseif {$op == "create" || $op == "modify" || $op == "truncate"} {
            # If this is a write operation but earlier jobs already read
            # the file, we've found a read-before-write instance.

            if { [llength $readers] } {
                lappend instances [list $readers $job $filename]
            }

            # After we see a write on this file we can move on to the next.

            break
        }
    }
}

# For each instance, print the filename, the writer, and each reader.

set result ""
foreach instance $instances {
    foreach {readers writer filename} $instance { break }
    set writerName [$anno job name $writer]
    set writerFile [$anno job makefile $writer]
    set writerLine [$anno job line $writer]
    append result "FILENAME:\n  $filename\n"
    append result "WRITER  :\n  $writerName ($writerFile:$writerLine)\n"
    append result "READERS :\n"
    foreach reader $readers {
        set readerName [$anno job name $reader]
        set readerFile [$anno job makefile $reader]
        set readerLine [$anno job line $reader]
        append result "  $readerName ($readerFile:$readerLine)\n"
    }
}

With a bit of additional boilerplate you can run this report from the command-line with Insight 4.0 (currently in limited beta). A couple notes on usage: you should instruct emake to generate lookup-level annotation, by adding –emake-annodetail=lookup to your invocation. And, you should run the build with the -k (keep-going) option — otherwise, the error in reader will prevent writer from running, and emake will not record filesystem usage for it. Once you have a suitable annotation file, here’s how the report looks for this build:

$ einsight --report=ReadBeforeWrite emake.xml
done.
FILENAME:
/home/ericm/test/output
WRITER :
writer (Makefile:7)
READERS :
reader (Makefile:3)

Voila! We’ve pinpointed the problem with barely 50 lines of code (including comments!). You can even see a solution: add writer as a prerequisite of reader, on line 3 of Makefile.

Show me what you can do with ElectricAccelerator

As you’ve seen, ElectricAccelerator makes it easy to identify and correct build problems that would otherwise be nearly impossible to root out. Hopefully you also see that this is just the tip of the iceberg — with consistent fast builds and the treasure trove of data available in annotation files, what other analysis could you do? To get started, you can download a free trial of ElectricAccelerator Developer Edition and check out the reports in ElectricInsight. You can also download the Read Before Write report for ElectricInsight from my GitHub repo. If you come up with something cool, tell me about it in the comments!

try_eade_button2

post

#pragma multi and rules with multiple outputs in GNU make

Recently we released ElectricAccelerator 6.2, which introduced a new bit of makefile syntax — #pragma multi — which allows you to indicate that a single rule produces multiple outputs. Although this is a relatively minor enhancement, I’m really excited about it because this it represents a new direction for emake development: instead of waiting for the GNU make project to add syntactic features and then following some time later with our emulation, we’re adding features that GNU make doesn’t have — and hopefully they will have to follow us for a change!

Unfortunately I haven’t done a good job articulating the value of #pragma multi. Unless you’re a pretty hardcore makefile developer, you probably look at this and think, “So what?” So let’s take a look at the problem that #pragma multi solves, and why #pragma multi matters.

Rules with multiple outputs in GNU make

The problem we set out to solve is simply stated: how can you specify to GNU make that one rule produces two or more output files? The obvious — but wrong — answer is the following:

1
2
foo bar: baz
touch foo bar

Unfortunately, this fragment is interpreted by GNU make as declaring two rules, one for foo and one for bar — it just so happens that the command for each rule creates both files. That will do more-or-less the right thing if you run a from-scratch, serial build:

$ gmake foo bar
touch foo bar
gmake: `bar' is up to date.

By the time GNU make goes to update bar, it’s already up-to-date thanks to the execution of the rule for foo. But look what happens when you run this same build in parallel:

$ gmake -j 2 foo bar
touch foo bar
touch foo bar

Oops! — the files were updated twice. No big deal in this trivial example, but it’s not hard to imagine a build where running the commands to update a file twice would produce bogus output, particularly if those updates could be happening simultaneously.

So what’s a makefile developer to do? In standard GNU make syntax, there’s only one truly correct way to create a rule with multiple outputs: pattern rules:

1
2
%.x %.y: %.in
touch $*.x $*.y

In contrast with explicit rules, GNU make interprets this fragment as declaring a single rule that produces two output files. Sounds perfect, but there’s a significant limitation to this solution: all of the output files must share a common sequence in the filenames (called the stem in GNU make parlance). That is, if your rule produces foo.x and foo.y, then pattern rules will work for you because the outputs both have foo in their names.

If your output files do not adhere to that naming limitation, then pattern rules can’t help you. In that case, you’re pretty much out of luck: there is no way to correctly indicate to GNU make that a single rule produces multiple output files. There are a variety of hacks you can try to coerce GNU make to behave properly, but each has its own limitations. The most common is to nominate one of the targets as the “primary”, and declare that the others depend on that target:

1
2
3
bar: foo
foo: baz
touch foo bar

Watch what happens when you run this build serially from scratch:

$ gmake foo bar
touch foo bar
gmake: Nothing to be done for `bar'.

Not bad, other than the odd “nothing to be done” message. At least the files weren’t generated twice. How about running it in parallel, from scratch?

$ gmake -j 2 foo bar
touch foo bar
gmake: Nothing to be done for `bar'.

Awesome! We still have the odd “nothing to be done” message, but just as in the serial build, the command was only invoked one time. Problem solved? Nope. What happens in an incremental build? If you’re lucky, GNU make happens to do the right thing and regenerate the files. But in one incremental build scenario, GNU make utterly fails to do the right thing. Check out what happens if the secondary output is deleted, but the primary is not:

$ rm -f bar && gmake foo bar
gmake: `foo' is up to date.
gmake: Nothing to be done for `bar'.

That’s right: GNU make failed to regenerate bar. If you’re very familiar with the build system, you might realize what had happened and think to either delete foo as well, or touch baz so that foo appears out-of-date (which would cause the next run to regenerate both outputs). But more likely at this point you just throw your hands up and do a full clean rebuild.

Note that all of the alternatives in vanilla GNU make have similar deficiencies. This kind of nonsense is why incremental builds have a bad reputation. This is why we created #pragma multi.

Rules with multiple outputs in Electric Make

By default Electric Make emulates GNU make, so it inherits all of GNU make’s limitations regarding rules with multiple outputs — with one critical exception. Even when running a build in parallel, Electric Make ensures that the output matches that produced by a serial GNU make build, which means that even the original, naive attempt will “work” for full builds regardless of whether the build is serial (single agent) or parallel (multiple agents).

Given that foundation, why did we bother with #pragma multi? There are a couple reasons:

  1. Correct incremental builds: with #pragma multi you can correctly articulate the relationships between inputs and outputs and thus ensure that all the outputs get rebuilt in incremental builds, rather than using kludges and hoping for the best.
  2. Out-of-the-box performance: although Electric Make guarantees correct output of the build, if you don’t have an up-to-date history file for the build you may waste time and compute resources running commands that don’t need to be run (work that will eventually be discarded when Electric Make detects the error). In the examples shown here the cost is negligible, but in real builds it could be significant.

Using #pragma multi is easy: just add the directive before the rule that will generate multiple outputs:

1
2
3
#pragma multi
foo bar: baz
touch foo bar

Watch what happens when this makefile is executed with Electric Make:

$ emake foo bar
touch foo bar

Note that there is no odd “is up to date” or “nothing to be done” message for bar — because Electric Make understands that both outputs are created by a single rule. Let’s verify that the build works as desired in the tricky incremental case that foiled GNU make — deleting bar without deleting foo:

$ rm -f bar && emake foo bar
touch foo bar

As expected, both outputs are regenerated: even though foo existed, bar did not, so the commands were executed.

Summary: rules with multiple outputs

Let’s do a quick review of the strategies for creating rules with multiple outputs. For simplicity we can group them into three categories:

  • #pragma multi
  • The naive approach, which does not actually create a single rule with multiple outputs at all.
  • Any of the various hacks for approximating rules with multiple outputs.

Here’s how each strategy fares across a variety of build modes:

Electric Make GNU make
Full (serial) Full (parallel) Incremental Full (serial) Full (parallel) Incremental
#pragma multi N/A
Naive
Hacks


The table paints a grim picture for GNU make: there is no way to implement rules with multiple outputs using standard GNU make which reliably gives both correct results and good performance across all types of builds. The naive approach generates the output files correctly in serial builds, but may fail in parallel builds. The various hacks work for full builds, but may fail in incremental builds. Even in cases where the output files are generated correctly, the build is marred by spurious “is up to date” or “nothing to be done for” messages — which is why most of the entries in the GNU make side are yellow rather than green.

In contrast, #pragma multi allows you to correctly generate multiple outputs from a single rule, for both full and incremental builds, in serial and in parallel. The naive approach also “works” with Electric Make, in that it will produce correct output files, but like GNU make the build is cluttered with spurious warnings. And, unless you have a good history file, the naive approach can trigger conflicts which may negatively impact build performance. Finally, despite its sophisticated conflict detection and correction smarts, even Electric Make cannot ensure correct incremental builds when you’ve implemented one of the multiple output hacks.

So there you have it. This is why we created #pragma multi: without it, there’s just no way to get the job done quickly and reliably. You should give ElectricAccelerator a try.

try_eade_button2

post

Fixing recursive make

Recursive make is one of those things that everybody loves to hate. It’s even been the subject of one of those tired “… Considered Harmful” diatribes. According to popular opinion, recursive make will sap performance from your build, make it nigh impossible to ensure correctness in parallel builds, and may render the user sterile. OK, maybe not that last one. But seriously, the arguments against recursive make are legion, and deeply entrenched. The problem? They’re flawed. That’s because they assume there’s only one way to implement recursive make — when the submake is invoked, the parent make is blocked until the submake completes. That’s how almost everybody does it. But in Electric Make, part of ElectricAccelerator, we developed a novel new approach called non-blocking recursive make. This design eliminates the biggest problems attributed to recursive make, without requiring a painful and costly conversion of your build system to non-recursive make.

The problem with traditional recursive make

There’s really just two problems at the heart of complaints with traditional recursive make: first, there’s no way to ensure correctness of a parallel recursive make based build without overserializing the submakes, because there’s no way to articulate dependencies between individual targets in different submakes. That means you can’t have a dependency graph that is both correct and precise. Instead you either leave out the critical dependency entirely, which makes parallel (ie, fast) builds unreliable; or you serialize submakes in their entirety, which shackles build performance because no part of a submake with even a single dependency on some portion of an earlier submake can begin until the entire ealier submake completes. Second, even if there were a way to specify precise dependencies between targets in different submakes, most versions of make have implemented recursive make such that the parent make is blocked from proceeding until the submake has completed. Consider a typical use of recursive make with implicit serializations between submakes:

1
2
3
4
all:
@for dir in util client server ; do \
$(MAKE) -C $$dir; \
done

Each submake compiles a bunch of source files, then links them together into a library (util) or an executable (client and server). The only actual dependency between the work in the three make instances is that the client and server programs need the util library. Everything else is parallelizable, but with traditional recursive make, gmake is unable to exploit that parallelism: all of the work in the util submake must finish before any part of the client submake begins!

Conflict detection and non-blocking recursive make

If you’re familiar with Electric Make, you already know how it solves the first half of the recursive make problem: conflict detection and correction. I’ve written about conflict detection before, but here’s a quick recap: using the explicit dependencies given in the makefiles and information about the files accessed as each target is built, emake is able to dynamically determine when targets have been built too early due to missing explicit dependencies, and rerun those targets to generate the correct output. Electric Make can ensure the correctness of parallel builds even in the face of incomplete dependencies, even if the missing dependencies are between targets in different submakes. That means you need not serialize entire submakes to ensure the build will run correctly in parallel.

Like an acrobat’s safety net, conflict detection allows us to consider solutions to the other half of the problem that would otherwise be considered risky, if not outright madness. In fact, our solution would not be possible without conflict detection: non-blocking recursive make. This is analogous to the difference between blocking and non-blocking I/O: rather than waiting for a recursive make to finish, emake carries on executing subsequent commands in the build immediately, including other recursive makes. Conflict detection ensures that only the commands in each submake which require serialization are executed sequentially, so the build runs as quickly as possible, but the final build output is identical to a serial build.

The impact of this change is dramatic. Here I’ve plotted the execution of the simple build defined above on four cores, using both gmake (normal recursive make) and emake (non-blocking recursive make):

Recursive make build with gmake


Recursive make build with emake

Electric Make is able to execute this build about 20% faster than gmake, with no changes to the Makefiles or the execution environment. emake is literally able to squeeze more parallelism out of recursive-make-based builds than gmake. In fact, we can precisely quantify just how much more parallelism emake gets through an application of Amdahl’s law. First, we compute the best possible speedup for the build — that’s just the serial runtime divided by the best possible parallel runtime, which we can figure out through analysis of the depedency graph and runtime of individual jobs in the build (the Longest Serial Chain report in ElectricInsight can do this for you). Then we can compute the parallelizable portion P of the build by plugging the speedup S into this equation: P = 1 - (1 / S). Here’s how that works out for gmake and emake:

gmake emake
Serial baseline 65s 65s
Best build time 13.5s 7.5s
Best speedup 4.8x 8.7x
Parallel portion 79% 89%

On this build, non-blocking recursive make increases the parallel portion of the build by 10%. That may not seem like much, but Amdahl’s law shows how dramatically that difference affects the speedup you can expect as you apply more cores:

Implementation

On the backend, non-blocking recursive make is handled by conflict detection — the jobs from the recursive make are checked for conflicts in the serial order defined by the makefile structure. Any issues caused by aggressively running recursive makes early are detected during the conflict check, and the target that ran too early is rerun to generate the correct result.

On the frontend, emake uses a strategy that is at once both brilliant in its simplicity, and diabolical in its trickery. It starts with an environment variable. When emake is invoked recursively, it checks the value of EMAKE_BUILD_MODE. If it is set to node, emake runs in so-called stub mode: rather than executing the submake (parsing the makefile and building targets), emake captures the invocation context (working directory, command-line and environment) in a file on disk, prints a “magic” string and exits with a zero status code.

The file containing the invocation context is identified by a second environment variable, ECLOUD_RECURSIVE_COMMAND_FILE. The Accelerator agent (which handles invoking commands on behalf of emake) checks for the presence of that file after every command that is run. If it is found, the agent relays the content to the toplevel emake invocation, where a new make instance is created to represent the submake invocation. That instance comes with it’s own parse job of course, which gets inserted into the queue of jobs. Some (short) time later, the parse job will run, discover whatever work must be run by the submake, and create additional rule jobs.

The magic string — EMAKE_FNORD — serves as a placeholder in the stdout stream for the jobs, so emake can figure out which portion of the output text comes before and which portion comes after the submake. This ensures that the build output log is identical to that generated by a serialized gmake build. For example, given the following rule that invokes a submake, you’d expect to see the “Before” and “After” messages printed before and after the output generated by commands in the submake itself:

1
2
3
4
all:
@echo Before util ; \
@$(MAKE) -C util ; \
@echo After util

With non-blocking recursive make, the submake has not actually executed when the “echo After util” command runs. If emake doesn’t account for that reordering, both the “Before” and “After” messages will appear before any of the output from the submake. EMAKE_FNORD allows emake to “stitch” the output together so the build log matches a serial log.

Limitations

Conflict detection and non-blocking recursive make together solve the main problems associated with recursive make. But there are a couple scenarios where non-blocking recursive make does not work well. Fortunately, these are uncommon in practice and easily addressed.

Capturing recursive make stdout

The first scenario is when the build captures the output of the recursive make invocation, rather than letting it print to stdout as normal. Since emake defers the execution of the submake and prints only EMAKE_FNORD to stdout, this will not work. There are two reasons you might do this: first, you might want to have separate build logs for each submake, to simplify error detection and management. In this situation, the simplest workaround is to remove the redirection and instead us emake’s annotated build log, an XML version of the build output log which can be easily processed using standard tools. Second, you may be using make as a text-processing tool (sort of a “poor man’s” Perl), rather than for building per se:

1
2
3
all:
@$(MAKE) -f genlist.mk > objects.txt
@cat objects.txt | xargs rm

In this case, the workaround is to explicitly force emake to run in so-called “local” mode, which means emake will handle the recursive make invocation as a blocking invocation, just like traditional make would. You can force emake into local mode by adding EMAKE_BUILD_MODE=local to the environment before the recursive make invocation.

Immediate consumption of build products

The second scenario is when the build consumes the product of the submake in the same command that contains the invocation. For example:

1
2
all:
@$(MAKE) -C sub foo && cp sub/foo ./foo

Here the build assumes that the output files generated by the submake will be available for use immediately after the submake completes. Obviously this is not the case with non-blocking recursive make — when the invocation of $(MAKE) -C sub foo completes, only the submake stub has actually finished. The build products will not be available until after the submake is actually processed later. Note that in this build both the recursive make invocation and the commands that use the build products from that invocation are treated as a single command from the perspective of make: make actually invokes the shell, and the shell then runs the recursive make and cp commands.

The workaround is simple: split the consumer into a distinct command, from the perspective of make:

1
2
3
all:
@$(MAKE) -C sub foo
@cp sub/foo ./foo

With that trivial change, emake is able to treat the cp as a continuation job, which can be serialized against the completion of the recursive make as needed.

A fix for recursive make

For years, people have heaped scorn and criticism on recursive make. They’ve nearly convinced everybody that even considering its use is automatically wrong — you probably can’t help feeling a little bit guilty when you use recursive make. But the reality is that recursive make is a reasonable way to structure a large build. You just need a better make. With conflict detection and non-blocking recursive make, Electric Make has fixed the problems usually associated with recursive make, so you can get parallel builds that are both fast and correct. Give it a try!

post

Another confusing conflict in ElectricAccelerator

After solving the case of the confounding conflict, my user came back with another scenario where ElectricAccelerator produced an unexpected (to him) conflict:

1
2
3
4
5
6
all:
@$(MAKE) foo
@cp foo bar
foo:
@sleep 2 && echo hello world > foo

If you run this build without a history file, using at least two agents, you will see a conflict on the continuation job that executes the cp foo bar command, because that job is allowed to run before the job that creates foo in the recursive make invocation. After one run of course, emake records the dependency in history, so later builds don’t make the same mistake.

This situation is a bit different from the symlink conflict I showed you previously. In that case, it was not obvious what caused the usage that triggered the conflict (the GNU make stat cache). In this case, it’s readily apparent: the continuation job reads (or attempts to read) foo before foo has been created. That’s pretty much a text-book example of the sort of thing that causes conflicts.

What’s surprising in this example is that the continuation job is not automatically serialized with the recursive make that precedes it. In a very real sense, a continuation job is an artificial construct that we created for bookkeeping reasons internal to the implementation of emake. Logically we know that the commands in the continuation job should follow the commands in the recursive make. In fact it would be absolutely trivial for emake to just go ahead and stick in a dependency to ensure that the continuation is not allowed to start until after the recursive make finishes, thereby avoiding this conflict even when you have no history file.

Given a choice between two strategies that both produce correct output, emake uses the strategy that produces the best performance in the general case.

Absolutely trivial to do, yes — but also absolutely wrong. Not for correctness reasons, this time, but for performance. Remember, emake is all about maximizing performance across a broad range of builds. Given a choice between two strategies that both produce correct output, emake uses the strategy that produces the best performance in the general case. For continuation jobs, that means not automatically serializing the continuation against the preceding recursive make. I could give you a wordy, theoretical explanation, but it’s easier to just show you. Suppose that your makefile looked like this instead of the original — the difference here is that the continuation job itself launches another recursive make, rather than just doing a simple cp:

1
2
3
4
5
6
7
8
9
all:
@$(MAKE) foo
@$(MAKE) bar
foo:
@sleep 2 && echo hello world > foo
bar:
@sleep 2 && echo goodbye > bar

Hopefully you agree that the ideal execution of this build would have both foo and bar running in parallel. Forcing the continuation job to be serialized with the preceding recursive make would choke the performance of this build. And just in case you’re thinking that emake could be really clever by looking at the commands to be executed in the continuation job, and only serializing “when it needs to”: it can’t. First, that would require emake to implement an entire shell syntax parser (or several, really, since you can override SHELL in your makefile). Second, even if emake had that ability, it would be thwarted the instant the command is something like my_custom_script.pl — there’s no way to tell what will happen when that gets invoked. It could be a simple filesystem access. It could be a recursive make. It could be a whole series of recursive makes. Even when the command is something you think you recognize, can emake really be sure? Maybe cp is not our trustworthy standard Unix cp, but something else entirely.

Again, all is not lost for this user. If you want to avoid this conflict, you have a couple options:

  1. Use a good history file from a previous build. This is the simplest solution. You’ll only get conflicts in this build if you run without a history file.
  2. Refactor the makefile. You can explicitly describe the dependency between the commands in the continuation job and the recursive make by refactoring the makefile so that the stuff in the continuation is instead its own target, thus taking the decision out of emake’s hands. Here’s one way to do that:
    1
    2
    3
    4
    5
    6
    7
    8
    all: do_foo
    @cp foo bar
    do_foo:
    @$(MAKE) foo
    foo:
    @sleep 2 && echo hello world > foo

Either of these will eliminate the conflict from your build.

post

ElectricAccelerator and the Case of the Confounding Conflict

A user recently asked me why ElectricAccelerator reports a conflict in this simple build, when executed without a history file from a previous run:

1
2
3
4
5
6
7
all: foo symlink_to_foo
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo

Specifically, if you have at least two agents, emake will report a conflict between symlink_to_foo and foo, indicating that symlink_to_foo somehow read or otherwise accessed foo during execution! But ln does not access the target of a symlink when creating the symlink — in fact, you can even create a symlink to a non-existent file if you like. It seems obvious that there should be no conflict. What’s going on?

To understand why this conflict occurs, you have to wrap your head around two things. First, there’s more going on during a gmake-driven build than just the commands you see gmake invoke. That causes the usage that provokes the conflict. Second, emake considers a serial gmake build the “gold standard” — if a serial gmake build produces a particular result, so too must emake. That’s why the additional usage must result in a conflict.

In this case, the usage that triggers the conflict comes from management of the gmake stat cache. This is a gmake feature that was added to improve performance by avoiding redundant calls to stat() — once you’ve stat()‘d a file once, you don’t need to do it again. Unless the file is changed of course, which happens quite a lot during a build. To keep the stat cache up-to-date as the build progresses, gmake re-stat()‘s each target after it finishes running the commands for the target. So after the commands for symlink_to_foo complete, gmake stat()‘s symlink_to_foo again, using the standard stat() system call, which follows the symlink (in contrast to lstat(), which does not follow the symlink). That means gmake will actually cache the attributes of foo for symlink_to_foo.

To ensure compatibility with gmake, emake has to do the same. In Accelerator parlance, that means we get read usage on symlink_to_foo (because you have to read the symlink itself to determine the target of the symlink), and lookup usage on foo. The lookup on foo causes the conflict, because, of course, you will get a different result if you lookup foo before the job that creates it than you would get if you do the lookup after that job. Before the job, you’ll find that foo does not exist, obviously; after, you’ll find that it does.

But what difference does that make, really? In truth, if there’s no detectable difference in behavior, then it doesn’t matter at all. And in the example build there is no detectable difference — the build output is the same regardless of when exactly you stat() symlink_to_foo relative to when foo is created. But with a small modification to the build, it is suddenly becomes possible to see the impact:

1
2
3
4
5
6
7
8
9
10
all: foo symlink_to_foo reader
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo
reader: foo symlink_to_foo
@echo newer prereqs are: $?

Compare the output when this build is run serially with the output when the build is run in parallel — and note that I’m using gmake, so you can be certain I’m not trying to trick you with some peculiarity of emake’s implementation:

You can plainly see the difference: in the parallel build gmake stat()‘s symlink_to_foo before foo exists, so the stat cache records symlink_to_foo as non-existent. Then when gmake generates the value of $? for reader, symlink_to_foo is excluded, because non-existent files are never considered newer than existing files. In the serial build, gmake stat()‘s symlink_to_foo after foo has been created, so the stat cache indicates that symlink_to_foo exists and is newer than reader, so it is included in $?.

Hopefully you see now both what causes the conflict, and why it is necessary. The conflict occurs because of lookup usage generated when updating the stat cache. The conflict is necessary to ensure that the build output matches that produced by a serial gmake — the “gold standard” for build correctness. If no conflict is declared, there is the possibility for a detectable difference in build output compared to serial gmake.

However, you might be thinking that although it makes sense to treat this as a conflict in the general case, isn’t it possible to do something smarter in this specific case? After all, the orignal example build does not use $?, and without that there isn’t any detectable difference in the build output. So why not skip the conflict?

The answer is simple, if a bit disappointing. In theory it may be possible to elide the conflict by checking to see if the symlink is used by a later job in a manner that would produce a detectable difference (for example, by scanning the commands for subsequent targets for references to $?), but in reality the logistics of that check are daunting, and I’m not confident that we could guarantee correct behavior in all cases.

Fortunately all is not lost. If you wish to avoid this conflict, you have several options:

  1. Use a good history file from a previous build. This is the most obvious solution. You’ll only get conflicts if you run without a history file.
  2. Add an explicit dependency. If you make foo an explicit prereq of symlink_to_foo, then you will avoid the conflict. Here’s how that would look:
    1
    symlink_to_foo: foo
  3. Change the serial order. If you reorder the makefile so that symlink_to_foo has an earlier serial order than foo you will avoid the conflict. That just requires a reordering of the prereqs of all:
    1
    all: symlink_to_foo foo

Any one of these will eliminate the conflict from your build, and you’ll enjoy fast and correct parallel builds.

Case closed.

post

Makefile hacks: automatically split long command lines

If you’ve worked on a large build system you’ve probably bumped into this error, or one like this:

gmake: execvp: /bin/sh: Argument list too long

This error means the length of some command-line in your makefile has grown past the system limit, which is typically in the 32 to 256 kilobyte range. It’s surprisingly easy to hit that limit. You start with a small list of object files to be linked together. Over time you add more, and the command-line gets a little longer. Add a few more and it gets longer still. Before you know it you have a monster command-line and your build starts failing.

The solution to this problem is simple: split the long command-line into several shorter command-lines. For example, ar r libraries/lib.a objects/foo.o objects/bar.o objects/baz.o objects/boo.o objects/bang.o becomes something like this:

ar r libraries/lib.a objects/foo.o objects/bar.o
ar r libraries/lib.a objects/baz.o objects/boo.o
ar r libraries/lib.a objects/bang.o

Simple in theory, but tedious to do by hand. And doing it manually is like putting a ticking time-bomb into your makefile — it’s only a matter of time before your build grows enough that you have to go through this exercise again.

I recently ran across a clever solution that exploits the $(eval) function in GNU make to split long command-lines automatically, eliminating the tedium and the time-bomb. After I show you the solution, I’ll explain it piece-by-piece.

The max_args function

The solution is a user-defined function called max_args that splits long command-lines into equal-length chunks:

1
2
3
4
5
6
7
8
9
define max_args
$(eval _args:=)
$(foreach obj,$3,$(eval _args+=$(obj))$(if $(word $2,$(_args)),$1$(_args)$(EOL)$(eval _args:=)))
$(if $(_args),$1$(_args))
endef
define EOL
endef

And an example of its use:

1
2
3
OBJS:=a b c d e f g h
all:
@$(call max_args,echo,2,$(OBJS))

The max_args function takes three parameters: the base command-line, the number of arguments per “chunk”, and the complete list of arguments. It expands to a series of command-lines — one for each chunk of arguments.

The trick behind max_args is the use of $(eval) to update a variable as a side-effect of gmake’s regular variable expansion activity. If you’re not familiar with gmake variable expansion, here’s a quick rundown: when gmake finds a variable or function reference, like $(something), it replace the entire reference with an expanded value. In the case of a variable that’s just the value of the variable. Most variables in gmake are recursive which means that if the variable value itself contains embedded variable references, those will be expanded as well, recursively. In the case of a function, gmake evaluates the function, and replaces the reference with the computed value.

The meat of max_args is on line 3. It starts with the $(foreach) function, which evaluates its third argument, the body of the loop, once for each word in its second argument — in this case, the list of objects passed in the call to max_args.

In max_args, the loop body has two components. The first is a call to $(eval), which simply appends the current value of the loop variable to an accumulator called _args.

The second component of the loop body uses $(if) and $(word) to check the length of _args. The $(word) function returns the nth word from a list, or an empty string if there are fewer than n words in the list. The $(if) function expands its second argument (the then clause) only if its first argument (the condition) expands to a non-empty string, so together these functions check if _args has the desired number of words, and if so the then clause of the $(if) is expanded.

The then clause of this $(if) has two components. The first constructs a completed command-line by concatenating the base command-line, here given by $1, the first argument to the original max_args call; the accumulated arguments; and a newline character. Thanks to the rules of gmake expansion, this command-line is added to the overall expansion result for the max_args function. The second part of the then clause uses $(eval) to reset the accumulator

If the chunk size does not evenly divide the number of arguments, the stragglers are emitted in a final command-line on the last line of max_args.

Limitations

max_args is handy but it has one significant limitation: command-line length limits are based on the number of bytes in the command-line, not the number of words, in it. Unfortunately, gmake has no built-in way to count the number of characters in a string. gmake does provide the $(words) built-in, so that’s what max_args uses. That just means that to use it effectively you have to take a guess at the number of arguments that will fit in a single command-line, for example by dividing the length limit by the average number of characters in each argument, then subtracting some to allow some buffer for outliers.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: