What’s new in ElectricAccelerator 7.2?

Wow, time flies! Another six months has come and gone, which means it’s time for another ElectricAccelerator feature release. Right on cue, ElectricAccelerator 7.2 dropped a couple weeks back on April 17, 2014. There’s no unifying theme to this release — actually we’re in the middle of a much more ambitious project that I can’t say much about quite yet, but over the last several months we’ve made a number of improvements to Accelerator core functionality, and we’re eager to get those out users. Thus we have the 7.2 release, with the following marquee features: dramatic Linux performance improvements for certain use cases, a key enhancement to our parse avoidance feature to improve accuracy, and expanded Linux platform support. Read on for the details.

Linux performance improvements

Accelerator 7.2 incorporates two performance improvements for Linux-based builds. The first is a redesign of the integration between the Electric File System (EFS) and the Linux kernel, which reduces lock contention in the EFS. Consequently, any build job that makes concurrent accesses to the filesystem should see some performance improvement. In one example, a build that executed two tar processes simultaneously in one job saw runtime drop from 11 minutes with Accelerator 7.1 to just 6 minutes with Accelerator 7.2, nearly 2x faster!

The second improvement is full support for the Linux d_type extension to the readdir() system call. On most Unix and Unix-like systems, the readdir() system call only gives the application programmer a couple pieces of information: the names of the files in a directory, and the inode number for those files. On Linux, filesystems may also include file type information in the results, which enables programs to operate more efficiently in some cases as they can avoid the overhead of an addition stat() call to get the file type. On a local filesystem that optimization is interesting but not necessarily game-changing; but with a distributed network filesystem like the EFS, that optimization can result in enormous improvements. In our benchmarks we saw jobs using find to scan large directory structures execute nearly 9x faster with Accelerator 7.2 versus Accelerator 7.1

Parse avoidance update

For large builds with many or complicated makefiles, Accelerator’s parse avoidance feature is a game-changer by dramatically reducing the time necessary to read and interpret makefiles at the start of a build. On the Android KitKat open-source build, parse avoidance reduces a 4 minute parse job to about 5 seconds — nearly 50x faster!. Since its introduction in Accelerator 7.0, parse avoidance has delivered jaw-dropping improvements like that in a wide variety of builds.

But use of this feature has been problematic in one specific use case: makefiles that use wildcards in prerequisite lists, with either $(wildcard) or $(shell). In certain circumstances this makefile anti-pattern could cause emake to produce “false positives” from the parse avoidance cache, such that emake would incorrectly use a previously cached parse result when it should have instead reparsed the makefile. In Accelerator 7.2 we’ve extended the #pragma cache syntax so that you can inform emake of the wildcard patterns to consider when determining cache suitability. This will enable even more users to enjoy the benefits of parse avoidance, without sacrificing reliability or performance. Usage instructions can be found in the Electric Make User’s Guide.

New platform support

Finally, with Accelerator 7.2 we’ve further expanded our already sweeping platform support to include RedHat Enterprise Linux 6.5, Ubuntu 13.04 and Windows Server 2012. This may seem like a modest increment, but I’m particularly excited about this update not for the what but for the who: you see, this is the first time that somebody other than myself made all the updates needed to support a new version of Linux, start to finish. With another set of hands to do that work we should be able to add support for new Linux platforms much more quickly in the future, which is welcome news indeed (thanks, Tim)!

What’s next?

Years ago, I thought that we would eventually get to the point that Accelerator was “done” and we’d have nothing left to do. How young and foolish I was! In reality, it seems that the TODO list only gets longer and longer. We’re still working hard on the “buddy cluster” concept, as well as Bitbake and ninja integration. And of course, we’re always working to improve performance — more on that in a future post.

ElectricAccelerator 7.2 is available immediately for existing customers. Contact support@electric-cloud.com to get the bits. New users can contact sales@electric-cloud.com for a evaluation.

The Twelve Days of Christmas, GNU make style

Well, it’s Christmas Day in the States today, and while we’re all recovering from the gift-opening festivities, I thought this would be the perfect time for a bit of fun with GNU make. And what better subject matter than the classic Christmas carol “The Twelve Days of Christmas”? Its repetitive structure is perfect for demonstrating how to use several of GNU make’s built-in functions for iteration, selection and sorting. This simple makefile prints the complete lyrics to the song:

L01=Twelve drummers drumming,
L02=Eleven pipers piping,
L03=Ten lords-a-leaping,
L04=Nine ladies dancing,
L05=Eight maids-a-milking,
L06=Seven swans-a-swimming,
L07=Six geese-a-laying,
L08=Five golden rings,
L09=Four calling birds,
L10=Three french hens,
L11=Two turtle doves, and
L12=A partridge in a pear tree!
LINES=12 11 10 09 08 07 06 05 04 03 02 01
DAYS=twelfth eleventh tenth ninth \
eighth seventh sixth fifth \
fourth third second first
$(foreach n,$(LINES),\
$(if $(X),$(info ),$(eval X=X))\
$(info On the $(word $n,$(DAYS)) day of Christmas,)\
$(info my true love gave to me)\
$(foreach line,$(wordlist $n,12,$(sort $(LINES))),\
$(info $(L$(line)))))
all: ; @:

By count, most of the lines here just declare variables, one for each item mentioned in the song. Note how the items are ordered: the last item added is given the lowest index. That means that to construct each verse we simply enumerate every item in the list, in order, starting with the new item in each verse.

Line 18 is where the real meat of the makefile begins. Here we use GNU make’s foreach function to iterate through the verses. $(foreach) takes three arguments: a name for the iteration variable, a space-separated list of words to assign to the iteration variable in turn, and a body of text to expand repeatedly, once for each word in the list. Here, the list of words is given by LINES, which lists the starting line for each verse, in order — that is, the first verse starts from line 12, the second from line 11, etc. The text to expand on each iteration is all the text on lines 19-23 of the makefile — note the use of backslashes to continue each line to the next.

Line 19 uses several functions to print a blank line before starting the next verse, if we’ve printed a verse already: the $(if) function, which expands its second argument if its first argument is non-empty, and its third argument if its first argument is empty; the $(info) function to print a blank line; and the $(eval) function to set the flag variable. The first time this line is expanded, X does not exist, so it expands to an empty string and the $(if) picks the “else” branch. After that, X has a value, so the $(if) picks the “then” branch.

Lines 20 and 21 again use $(info) to print output — this time the prelude for the verse, like “On the first day of Christmas, my true love gave to me”. The ordinal for each day is pulled from DAYS using the $(word) function, which extracts a specified word, given by its first argument, from the space-separated list given as its second argument. Here we’re using n, the iteration variable from our initial $(foreach) as the selector for $(word).

Line 22 uses $(foreach) again, this time to iterate through the lines in the current verse. We use line as the iteration variable. The list of words is given again by LINES except now we’re using $(sort) to reverse the order, and $(wordlist) to select a subset of the lines. $(wordlist) takes three arguments: the index of the first word in the list to select, the index of the last word to select, and a space-separated list of words to select from. The indices are one-based, not zero-based, and $(wordlist) returns all the words in the given range. The body of this $(foreach) is just line 23, which uses $(info) once more to print the current line of the current verse.

Line 25 has the last bit of funny business in this makefile. We have to include a make rule in the makefile, or GNU make will complain *** No targets. Stop. after printing the lyrics. If we simply declare a rule with no commands, like all:, GNU make will complain Nothing to be done for `all’.. Therefore, we define a rule with a single “no-op” command that uses the bash built-in “:” to do nothing, combined with GNU make’s @ prefix to suppress printing the command itself.

And that’s it! Now you’ve got some experience with several of the built-in functions in GNU make — not bad for a Christmas day lark:

  • $(eval) for dynamic interpretation of text as makefile content
  • $(foreach), for iteration
  • $(if), for conditional expansion
  • $(info), for printing output
  • $(sort), for sorting a list
  • $(word), for selecting a single word from a list
  • $(wordlist), for selecting a range of words from a list

Now — where’s that figgy pudding? Merry Christmas!

UPDATE: SCons is Still Really Slow

A while back I posted a series of articles exploring the scalability of SCons, a popular Python-based build tool. In a nutshell, my experiments showed that SCons exhibits roughly quadratic growth in build runtimes as the number of targets increases:

Recently Dirk Baechle attempted to rebut my findings in an entry on the SCons wiki: Why SCons is not slow. I thought Dirk made some credible suggestions that could explain my results, and he did some smart things in his effort to invalidate my results. Unfortunately, his methods were flawed and his conclusions are invalid. My original results still stand: SCons really is slow. In the sections that follow I’ll share my own updated benchmarks and show where Dirk’s analysis went wrong.

Test setup

As before, I used genscons.pl to generate sample builds ranging from 2,000 to 50,000 targets. However, my test system was much beefier this time:

2013 2010
OS Linux Mint 14 (kernel version 3.5.0-17-generic) RedHat Desktop 3 (kernel version 2.4.21-58.ELsmp)
CPU Quad 1.7GHz Intel Core i7, hyperthreaded Dual 2.4GHz Intel Xeon, hyperthreaded
RAM 16 GB 2 GB
HD SSD (unknown)
SCons 2.3.0 1.2.0.r3842
Python 2.7.3 (system default) 2.6.2

Before running the tests, I rebooted the system to ensure there were no rogue processes consuming memory or CPU. I also forced the CPU cores into “performance” mode to ensure that they ran at their full 1.7GHz speed, rather than at the lower 933MHz they switch to when idle.

Revisiting the original benchmark

I think Dirk had two credible theories to explain the results I obtained in my original tests. First, Dirk wondered if those results may have been the result of virtual memory swapping — my original test system had relatively little RAM, and SCons itself uses a lot of memory. It’s plausible that physical memory was exhausted, forcing the OS to swap memory to disk. As Dirk said, “this would explain the increase of build times” — you bet it would! I don’t remember seeing any indication of memory swapping when I ran these tests originally, but to be honest it was nearly 4 years ago and perhaps my memory is not reliable. To eliminate this possibility, I ran the tests on a system with 16 GB RAM this time. During the tests I ran vmstat 5, which collects memory and swap usage information at five second intervals, and captured the result in a log.

Next, he suggested that I skewed the results by directing SCons to inherit the ambient environment, rather than using SCons’ default “sanitized” environment. That is, he felt I should have used env = Environment() rather than env = Environment(ENV = os.environ). To ensure that this was not a factor, I modified the tests so that they did not inherit the environment. At the same time, I substituted echo for the compiler and other commands, in order to make the tests faster. Besides, I’m not interested in benchmarking the compiler — just SCons! Here’s what my Environment declaration looks like now:

env = Environment(CC = 'echo', AR = 'echo', RANLIB = 'echo')

With these changes in place I reran my benchmarks. As expected, there was no change in the outcome. There is no doubt: SCons does not scale linearly. Instead the growth is polynomial, following an n1.85 curve. And thanks to the the vmstat output we can be certain that there was absolutely no swapping affecting the benchmarks. Here’s a graph of the results, including an n1.85 curve for comparison — notice that you can barely see that curve because it matches the observed data so well!

SCons full build runtime - click for larger view

For comparison, I used the SCons build log to make a shell script that executes the same series of echo commands. At 50,000 targets, the shell script ran in 1.097s. You read that right: 1.097s. Granted, the shell script doesn’t do stuff like up-to-date checks, etc., but still — of the 3,759s average SCons runtime, 3,758s — 99.97% — is SCons overhead.

I also created a non-recursive Makefile that “builds” the same targets with the same echo commands. This is a more realistic comparison to SCons — after all, nobody would dream of actually controlling a build with a straight-line shell script, but lots of people would use GNU make to do it. With 50,000 targets, GNU make ran for 82.469s — more than 45 times faster than SCons.

What is linear scaling?

If the performance problems are so obvious, why did Dirk fail to see them? Here’s a graph made from his test results:

SCons full build runtime, via D. Baechle - click for full size

Dirk says that this demonstrates “SCons’ linear scaling”. I find this statement baffling, because his data clearly shows that SCons does not scale linearly. It’s simple, really: linear scaling just means that the build time increases by the same amount for each new target you add, regardless of how many targets you already have. Put another way, it means that the difference in build time between 1,000 targets and 2,000 targets is exactly the same as the difference between 10,000 and 11,000 targets, or between 30,000 and 31,000 targets. Or, put yet another way, it means that when you plot the build time versus the number of targets, you should get a straight line with no change in slope at any point. Now you tell me: does that describe Dirk’s graph?

Here’s another version of that graph, this time augmented with a couple additional lines that show what the plot would look like if SCons were truly scaling linearly. The first projection is based on the original graph from 2,500 to 4,500 targets — that is, if we assume that SCons scales linearly and that the increase in build time between 2,500 and 4,500 targets is representative of the cost to add 2,000 more targets, then this line shows us how we should expect the build time to increase. Similarly, the second projection is based on the original graph between 4,500 and 8,500 targets. You can easily see that the actual data does not match either projection. Furthermore you can see that the slope of these projections is increasing:

SCons full build runtime with linear projections, via D. Baechle - click for full size

This shows the importance of testing at large scale when you’re trying to characterize the scalability of a system from empirical data. It can be difficult to differentiate polynomial from logarithmic or linear at low scales, especially once you incorporate the constant factors — polynomial algorithms can sometimes even give better absolute performance for small inputs than linear algorithms! It’s not until you plot enough data points at large enough values, as I’ve done, that it becomes easy to see and identify the curve.

What does profiling tell us?

Next, Dirk reran some of his tests under a profiler, on the very reasonable assumption that if there was a performance problem to be found, it would manifest in the profiling data — surely at least one function would demonstrate a larger-than-expected growth in runtime. Dirk only shared profiling data for two runs, both incremental builds, at 8,500 and 16,500 targets. That’s unfortunate for a couple reasons. First, the performance problem is less apparent on incremental builds than on full builds. Second, with only two datapoints it is literally not possible to determine whether growth is linear or polynomial. The results of Dirk’s profiling was negative: he found no “significant difference or increase” in any function.

Fortunately it’s easy to run this experiment myself. Dirk used cProfile, which is built-in to Python. To profile a Python script you can inject cProfile from the command-line, like this: python -m cProfile scons. Just before Python exits, cProfile dumps timing data for every function invoked during the run. I ran several full builds with the profiler enabled, from 2,000 to 20,000 targets. Then I sorted the profiling data by function internal time (time spent in the function exclusively, not in its descendents). In every run, the same two functions appeared at the top of the list: posix.waitpid and posix.fork. To be honest this was a surprise to me — previously I believed the problem was in SCons’ Taskmaster implementation. But I can’t really argue with the data. It makes sense that SCons would spend most of its time running and waiting for child processes to execute, and even that the amount of time spent in these functions would increase as the number of child processes increases. But look at the growth in runtimes in these two functions:

SCons full build function time, top two functions - click for full size

Like the overall build time, these curves are obviously non-linear. Armed with this knowledge, I went back to Dirk’s profiling data. To my surprise, posix.waitpid and posix.fork don’t even appear in Dirk’s data. On closer inspection, his data seems to include only a subset of all functions — about 600 functions, whereas my profiling data contains more than 1,500. I cannot explain this — perhaps Dirk filtered the results to exclude functions that are part of the Python library, assuming that the problem must be in SCons’ own code rather than in the library on which it is built.

This demonstrates a second fundamental principle of performance analysis: make sure that you consider all the data. Programmers’ intuition about performance problems is notoriously bad — even mine! — which is why it’s important to measure before acting. But measuring won’t help if you’re missing critical data or if you discard part of the data before doing any analysis.


On the surface, performance analysis seems like it should be simple: start a timer, run some code, stop the timer. Done correctly, performance analysis can illuminate the dark corners of your application’s performance. Done incorrectly — and there are many ways to do it incorrectly — it can lead you on a wild goose chase and cause you to squander resources fixing the wrong problems.

Dirk Baechle had good intentions when he set out to analyze SCons performance, but he made some mistakes in his process that led him to an erroneous conclusion. First, he didn’t run enough large-scale tests to really see the performance problem. Second, he filtered his experimental data in a way that obscured the existence of the problem. But perhaps his worst mistake was to start with a conclusion — that there is no performance problem — and then look for data to support it, rather than starting with the data and letting it impartially guide him to an evidence-based conclusion.

To me the evidence seems indisputable: SCons exhibits roughly quadratic growth in runtimes as the number of build targets increases, rendering it unusable for large-scale software development (tens of thousands of build outputs). There is no evidence that this is a result of virtual memory swapping. Profiling suggests a possible pair of culprits in posix.waitpid and posix.fork. I leave it to Dirk and the SCons team to investigate further; in the meantime, you can find my test harness and test results in my GitHub repo. If you can see a flaw in my methodology, sound off in the comments!

Halloween 2013 haunted graveyard

I know it’s a bit late for a retrospective on my annual Halloween tradition — the haunted graveyard. But there were a couple additions this year that I thought were worth mentioning, and I have some really excellent photos thanks to the efforts of a good friend so I figured “Better late than never!” For those who haven’t seen my previous articles about the graveyard, let me offer a brief backstory:

After my wife and I inherited her mother’s house several years ago, it became our responsibility to host the annual family Halloween party. Originally the party was intended for my mother-in-law’s adult children and their spouses, but once they started having kids of their own, the party shifted gears. It’s been primarily targeted at the kids as long as I’ve been involved. One of the marquee attractions is the haunted graveyard, where our backyard is transformed from a normal suburban plot into a spooky graveyard replete with zombies, ghosts and other monsters. As the man of the house, and because I have a bit of a macabre streak, it falls to me to design and … execute … the decorations. Every year the graveyard gets a little bit more elaborate as I snatch up more characters at the post-Halloween sales and devise better layouts and designs for the graveyard.

As in previous years, I used a short two-foot tall wooden fence to establish a perimeter for the graveyard. This serves two purposes. First, it provides a pathway for visitors to follow, so they aren’t just meandering through the graveyard. That allows me to better control the experience and ensure that people don’t approach characters from the wrong side. Second, it gives a little protection to the characters, to discourage children from physically abusing the decorations. You can see the fencing here (as usual, click on any of these pictures for a larger version):

2013 Graveyard Fence

Along with the fence you can see one of the architectural improvements in the graveyard: a dividing wall down the middle of the area. This serves to block view of the characters on the second half of the graveyard when you are walking down the first half. I fashioned this out of several pieces of 1 inch PVC pipe and PVC pipe connectors at a total cost of about $20 at the local hardware store. PVC’s a great choice for this because it’s cheap, lightweight and easy to work with — all you need is a hacksaw. It’s also easy to setup and tear down. This diagram shows the dimensions and parts I used to build the frame:

2013 Graveyard - Partition

The curtain is made from two black plastic tablecloths, the type you can find at a party supply store around Halloween time. The ends are folded over and stapled together to make sleeves into which the top and bottom PVC pipes can slide. I think the tablecloths cost about $8 each. To be honest the whole thing looks pretty cheap in the daylight, but once it gets dark it doesn’t matter. Next year I’ll probably blacken the PVC with some spraypaint. If I can find some cheap cloth, I’ll redo the curtains as well. If you do build something like this yourself, make sure that the feet are pretty wide, and put something heavy on each to hold it down. Otherwise one good gust of wind will knock your wall right over!

As for new characters this year, I have this creepy-as-hell “deady bear”. For such a small guy, he is surprisingly disturbing. When activated, his head turns side-to-side, one eyeball lights up, he stabs himself with a knife, and he makes seriously unsettling noises. Like most of the characters he’s sound activated:

2013 Graveyard - Deady Bear

I’ve also got this “zombie barrel”, who lights up, makes spooky sounds and rises up out of his barrel when activated. To be honest although he’s big and was expensive, it’s not one of my favorite characters. He’s difficult to setup, and he moves too slowly to be really scary:

2013 Graveyard - Zombie Barrel

Unfortunately I don’t have a picture of the third new character — this black jumping spider. This guy is great because he moves quickly, and he’s hard to see since he’s black and starts at ground level. It’s very startling — perfect for somebody coming around a corner.

Besides the partition and the new characters, there was one other upgrade that I’m really pleased with: the flying phantasm. I’ve actually had this character for a couple of years, but usually he just hangs “lifelessly” over the path:

2013 Graveyard - Flying Phantasm

This year I ran a 1/4 inch rope from my second-story roof to the fence at the edge of our property, then attached the ghost to a cheap pulley that I hung on the line:

2013 Graveyard - Phantasm Pulley

Once the line was taut, the ghost could “fly” down the line across the pathway, low enough that his tattered robe would brush against the head and shoulders of anybody walking by. Triggering the ghost was about as low-tech as it gets: somebody would watch from a second floor window behind the ghost and release him at suitable moments; a bit of string tied to his back allowed my cohort to pull him back up in preparation for the next run. After dark, the flying phantasm was a huge hit, even scaring some of the adults who went through the graveyard! Here’s a video of the daytime test run; you can see where I’ve got a bit of extra rope tied around the line to serve as a stop so that he doesn’t go too far:

That’s about it for new features this year. Overall I think this goes down as another success, but there are some things I’d like to improve for next year:

  • Get more foot activation pads! I cannot stress enough how critical these are. The sound activation on most of these characters just does not work very well, especially when people are creeping silently through (because they are too scared to make much noise!). One trick though: make sure the pads are covered up, otherwise they are a huge tip off that something is coming!
  • Improve the lighting! I have simultaneously too much and not enough light. The colored floods I’ve been using are too bright, which makes things less scary. But they also don’t provide enough light in the right places. For example, it was hard to see the “deady bear”. Next year I’ll use dimmer bulbs, and maybe get some of those little “hockey puck” style LED lights to provide target “up lighting” for specific characters. I’d love to hang one dim, naked bulb over the whole scene as well, that could just swing back and forth slowly — I think that would be really creepy looking.
  • Don’t forget the soundtrack! I have one of those “scary halloween sounds” albums that I’ve played in the past to good effect. I completely forgot to set that up this year.

Hope you enjoyed this post mortem! To wrap up, here are some pictures of the graveyard after dark — credit goes to my good friend Tim Murphy, who took most of the pictures used in this post, and who was a tremendous help in setting up the graveyard. See you next Halloween!

2013 Graveyard - Wide Angle

2013 Graveyard - Open Grave

2013 Graveyard - Orange Glow

2013 Graveyard - Spider

2013 Graveyard - Green Glow

2013 Graveyard - Zombie

2013 Graveyard - Zombie Barrel Up

The ElectricAccelerator 7.1 “Ship It!” Award

Well, it took a lot longer than I’d like, but at last I can reveal the Accelerator 7.1 “Ship It!” award. This is the fifth time I’ve commemorated our releases in this fashion, which I think is pretty cool itself.

Since this release again focused on performance, I picked a daring old-timey airplane pilot — the sort of guy you might have found behind the controls of a Sopwith Camel, with a maximum speed of about 115mph. Here’s the trading card that accompanied the figure:

Accelerator 7.1 "Ship It!" Card Front - click for larger version

Accelerator 7.1 “Ship It!” Card Front – click for larger version

Accelerator 7.1 "Ship It!" Card Back - click for larger version

Accelerator 7.1 “Ship It!” Card Back – click for larger version

I included release metrics again, but where the 7.0 card showed just 10 data points, the 7.1 card packs in a whopping 48 by including data for the 12 most recent releases across four categories;

  • Number of days in development.
  • JIRA issues closed.
  • Total KLOC. This metric gives the total size of the Accelerator code base in thousands of lines of code, as measured with the excellent Count Lines of Code utility by Al Danial. This measurement excludes comments and whitespace.
  • Change in KLOC. This is simply the arithmetic difference between the total KLOC for each release and its predecessor.

Again, my sincere gratitude goes to everybody on the Accelerator team. Well done and thank you!

What’s new in GNU make 4.0?

After a little bit more than three years, the 4.0 release of GNU make finally arrived in October. This release packs in a bunch of improvements across many functional areas including debuggability and extensibility. Here’s my take on the most interesting new features.

Output synchronization

For the majority of users the most exciting new feature is output synchronization. When enabled, output synchronization ensures that the output of each job is kept distinct, even when the build is run in parallel. This is a tremendous boon to anybody that’s had the misfortune of having to diagnose a failure in a parallel build. This simple Makefile will help demonstrate the feature:

all: a b c
@echo COMPILE a
@sleep 1 && echo a, part 1
@sleep 1 && echo a, part 2
@sleep 2 && echo a, part 3
b c:
@echo COMPILE $@
@sleep 1 && echo $@, part 1
@sleep 1 && echo $@, part 2
@sleep 1 && echo $@, part 3

Now compare the output when run serially, when run in parallel, and when run in parallel with –output-sync=target:

Serial Parallel Parallel with –output-sync=target
$ gmake
a, part 1
a, part 2
a, part 3
b, part 1
b, part 2
b, part 3
c, part 1
c, part 2
c, part 3
$ gmake -j 4
b, part 1
a, part 1
c, part 1
b, part 2
a, part 2
c, part 2
b, part 3
c, part 3
a, part 3
$ gmake -j 4 --output-sync=target
c, part 1
c, part 2
c, part 3
b, part 1
b, part 2
b, part 3
a, part 1
a, part 2
a, part 3

Here you see the classic problem with parallel gmake build output logs: the output from each target is mixed up with the output from other targets. With output synchronization, the output from each target is kept separate, not intermingled. Slick! The output doesn’t match that of the serial build, unfortunately, but this is still a huge step forward in usability.

The provenance of this feature is especially interesting, because the idea can be traced directly back to me — in 2009, I wrote an article for CM Crossroads called Descrambling Parallel Build Logs. That article inspired David Boyce to submit a patch to GNU make in 2011 which was the first iteration of the –output-sync feature.

GNU Guile integration

The next major addition in GNU make 4.0 is GNU Guile integration, which makes it possible to invoke Guile code directly from within a makefile, via a new $(guile) built-in function. Naturally, since Guile is a general-purpose, high-level programming language, this allows for far more sophisticated computation from directly within your makefiles. Here’s an example that uses Guile to compute Fibonacci numbers — contrast with my Fibonacci in pure GNU make:

define FIBDEF
(define (fibonacci x)
(if (< x 2)
(+ (fibonacci (- x 1)) (fibonacci (- x 2)))))
$(guile $(FIBDEF))
@echo $(guile (fibonacci $@))

Obviously, having a more expressive programming language available in makefiles will make it possible to do a great deal more with your make-based builds than ever before. Unfortunately I think the GNU make maintainers made a couple mistakes with this feature which will limit its use in practice. First, Guile was a poor choice. Although it’s a perfectly capable programming language, it’s not well-known or in wide use compared to other languages that they might have chosen — although you can find Scheme on the TIOBE Index, Guile itself doesn’t show up, and even though it is the official extension language of the GNU project, fewer than 25 of the GNU project’s 350 packages use Guile. If the intent was to embed a language that would be usable by a large number of developers, Python seems like the no-brainer option. Barring that for any reason, Lua seems to be the de facto standard for embedded programming languages thanks to its small footprint and short learning curve. Guile is just some weird also-ran.

Second, the make/Guile integration seem a bit rough. The difficulty arises from the fact that Guile has a rich type system, while make does not — everything in make is a string. Consequently, to return values from Guile code to make they must be converted to a string representation. For many data types — numbers, symbols and of course strings themselves — the conversion is obvious, and reversible. But for some data types, this integration does a lossy conversion which makes it impossible to recover the original value. Specifically, the Guile value for false, #f, is converted to an empty string, rendering it indistinguishable from an actual empty string return value. In addition, nested lists are flattened, so that (a b (c d) e) becomes a b c d e. Of course, depending on how you intend to use the data, each of these may be the right conversion. But that choice should be left to the user, so that we can retain the additional information if desired.

Loadable objects

The last big new feature in GNU make 4.0 is the ability to dynamically load binary objects into GNU make at runtime. In a nutshell, that load of jargon means that it’s possible for you to add your own “built-in” functions to GNU make, without having to modify and recompile GNU make itself. For example, you might implement an $(md5sum) function to compute a checksum, rather than using $(shell md5sum). Since these functions are written in C/C++ they should have excellent performance, and of course they can access the full spectrum of system facilities — file I/O, sockets, pipes, even other third-party libraries. Here’s a simple extension that creates a $(fibonacci) built-in function:

#include <stdio.h>
#include <gnumake.h>

int plugin_is_GPL_compatible;

int fibonacci(int n)
    if (n < 2) {
        return n;
    return fibonacci(n - 1) + fibonacci(n - 2);

char *gm_fibonacci(const char *nm, unsigned int argc, char **argv)
    char *buf  = gmk_alloc(33);
    snprintf(buf, 32, "%d", fibonacci(atoi(argv[0])));
    return buf;

int fibonacci_gmk_setup ()
    gmk_add_function ("fibonacci", gm_fibonacci, 1, 1, 0);
    return 1;

And here’s how you would use it in a makefile:

load ./fibonacci.so
@echo $(fibonacci $@)

I’m really excited about this feature. People have been asking for additional built-in functions for years — to handle arithmetic, file I/O, and other tasks — but for whatever reason the maintainers have been slow to respond. In theory, loadable modules will enable people to expand the set of built-in functions without requiring the approval or involvement of the core team. That’s great! I only wish that the maintainers had been more responsive when we invited them to collaborate on the design, so we might have come up with a design that would work with both GNU make and Electric Make, so that extension authors need only write one version of their code. Ah well — que sera, sera.

Other features

In addition to the major feature described above there are several other enhancements worth mentioning here:

  • ::= assignment, equivalent to := assignment, added for POSIX compatibility.
  • != assignment, which is basically a substitute for $(shell), added for BSD compatibility.
  • –trace command-line option, which causes GNU make to print commnds before execution, even if they would normally be suppressed by the @ prefix.
  • $(file …) built-in function, for writing text to a file.
  • GNU make development migrated from CVS to git.

You can find the full list of updates in the NEWS file in the GNU make source tree.

Looking ahead

It’s great to see continued innovation in GNU make. Remember, this is a tool that’s now 25 years old. How much of the software you wrote 25 years ago is still in use and still in active development? I’d like to offer a heartfelt congratulations to Paul Smith and the rest of the GNU make team for their accomplishments. I look forward to seeing what comes next!

What’s new in ElectricAccelerator 7.1

ElectricAccelerator 7.1 hit the streets a last month, on October 10, just six months after the 7.0 release in April. There are some really cool new features in this release, which picks up right where 7.0 left off by adding even more ground-breaking performance features: schedule optimization and Javadoc caching. Here’s a quick look at each.

Schedule Optimization

The idea behind schedule optimization is really simple: we can reduce overall build duration if we’re smarter about the order in which jobs are run. In essense, it’s about packing the jobs in tighter, eliminating idle time in the middle of the build and reducing the “ragged right edge”. Here’s a side-by-side comparison of the same build, first using normal scheduling and then using schedule optimization. You can easily see that schedule optimization made the second build faster — an 11% improvement in this small, real-world example:

Build using naive scheduling -- click to view full size

Build using naive scheduling — click to view full size

Build using schedule optimization - click to view full size

Build using schedule optimization – click to view full size

If you study the two runs more closely, you can see how schedule optimization produced this improvement: key jobs, in particular the longest jobs, were started earlier. As a result, idle time in the middle of the build was reduced or eliminated entirely, and the right edge is much less uneven. But the best part? It’s completely automatic: all you have to do is run the build once for emake to learn its performance profile. Every subsequent build will leverage that data to improve build performance, almost like magic.

Not convinced? Here’s a look at the impact of schedule optimization on another, much bigger proprietary build (serial build time 18h25m). The build is already highly parallelizable and achieves an impressive 37.2x speedup with 48 agents — but schedule optimization can reduce the build duration by nearly 25% more, bringing to total speedup on 48 agents to an eye-popping 47.5x!

Build duration with naive and optimized scheduling

Build duration with naive and optimized scheduling

There’s another interesting angle to schedule optimization though. Most people will take the performance gains and use them to get a faster build on the same hardware. But you could go the other direction just as easily — keep the same build duration, but do it with dramatically less hardware. The following graph quantifies the savings, in terms of cores needed to achieve a particular build duration. Suppose we set a target build duration of 30 minutes. With naive scheduling, we’d need 48 agents to meet that target. With schedule optimization, we need only 38.

Resource requirements with naive and optimized scheduling - click for full size

Resource requirements with naive and optimized scheduling – click for full size

I’m really excited about schedule optimization, because it’s one of those rare features that give you something for nothing. It’s also been a long time coming — the idea was originally conceived of over three years ago, and it’s only now that we were able to bring it to fruition.

Schedule optimization works with emake on all supported platforms, with all emulation modes. It is not currently available for use with electrify.

Javadoc caching

The second major feature in Accelerator 7.1 is Javadoc caching. Again, it’s a simple idea: think “ccache”, but for Javadoc instead of compiles. This is the next phase in the evolution of Accelerator’s output reuse initiative, which began in the 7.0 release with parse avoidance. Like any output reuse feature, Javadoc caching works by capturing the product of a Javadoc invocation and storing it in a cache indexed by a hash of the inputs used — including the Java files themselves, the environment variables, and the command-line. In subsequent builds, emake will check those inputs again and if it computes the same hash, emake will used the cached results instead of running Javadoc again. On big Javadoc jobs, this can produce significant savings. For example, in the Android “Jelly Bean” open-source build, the main Javadoc invocation usually takes about five minutes. With Javadoc caching in Accelerator 7.1, that job runs in only about one minute — an 80% reduction! In turn that gives us a full one minute reduction in the overall build time, dropping the build from 13 minutes to 12 — nearly a 10% improvement:

Uncached Javadoc job in Android build - click for full image

Uncached Javadoc job in Android build – click for full image

Cached Javadoc job in Android build - click for full build

Cached Javadoc job in Android build – click for full image

Javadoc caching is available on Solaris and Linux only in Accelerator 7.1.

Looking ahead

I hope you’re as excited about Accelerator 7.1 as I am — for the second time this year, we’re bringing revolutionary new performance features to the table. But of course our work is never done. We’ve been hard at work on the “buddy cluster” concept for the next release of Accelerator. Hopefully I’ll be able to share some screenshots of that here before the end of the year. We’re also exploring acceleration for Bitbake builds like the Yocto Project. And last, but certainly not least, we’ll soon start fleshing out the next phase of output reuse in Accelerator — caching compiler invocations. Stay tuned!