What’s new in ElectricAccelerator 5.4.0

This month, Electric Cloud announced the release of ElectricAccelerator 5.4. This version adds a lot of great new features, including support for GNU Make’s .SECONDEXPANSION feature and the use of $(eval) in rule bodies, and compatibility with Cygwin 1.7.7. In addition to those long-awaited improvements, here are the things that I’m most excited about in this release:

New cluster utilization reports

Accelerator 5.4 includes two new reports designed to give you greater insight into the load on and utilization of your cluster: the Cluster Utilization report and the Sealevel report:

The Cluster Utilization report shows, over the course of a typical day, the average number of builds running and the average combined agent demand from all running builds. The Sealevel report shows the raw agent demand data, plotted over the course of a day. The colored bands correspond to various cluster sizes, including the current cluster size and several hypothetical sizes, so you can see at a glance how large you need to make the cluster in order to satisfy all the agent requests. The percentages on the right side of the graph indicate the portion of agent requests that are left unsatisfied with a cluster of the given size. In the example above, all but 1% of agent requests would be satisfied if the cluster had 40 agents.

Reduced directory creation conflicts

Raise your hand if you’ve ever seen this pattern in a makefile:

%.o: %.c
        @mkdir -p $(dir $@)
        @$(COMPILE.c) -o $@ $<

It’s a common way to ensure the output directory exists before trying to create a file in it. Unfortunately, with a strict application of Accelerator’s conflict detection algorithm, this pattern causes numerous conflicts and poor performance when the build is run without an up-to-date history file. In Accelerator 5.4.0, we improved the algorithm so that this common case is no longer considered a conflict. If you always run with a good history file, this change will not be helpful to you. But sometimes that’s not possible — for example, if you’re building third-party code that’s just gotten a major update — then you’re going to really love this improvement. The Android source code is a perfect example: a from-scratch no-history build of the Gingerbread base used to take 144 minutes. Now it runs in just 22 minutes on the same hardware — 6.5x faster.

New Linux sandbox implementation

The last feature I want to mention here is the new sandbox implementation for Linux. The sandbox is the means by which Accelerator is able to present a different view of the filesystem, from a different point of time during the build, to each of the jobs running concurrently on a given agent host. Without the sandbox, it would be impossible on Linux to simultaneously represent a given file as existent to one job, and non-existent to another.

In previous versions of Accelerator, the Linux sandbox implementation was effective, but ultimately limited in its capabilities. Chief among those limitations was an inability to interoperate with autofs 5.x. There were several workarounds available, but each of those in turn had its own shortcomings.

Accelerator 5.4 uses a different underlying technology to implement the sandbox component: lofs, the loopback filesystem. This is a concept borrowed from Solaris, which has had a vendor-supplied version for years; Linux has nothing that matches the depth of functionality provided by Solaris, so we wrote our own. The net result of this effort is that the limitations of the previous implementation have been entirely eliminated. In particular, Accelerator 5.4 can interoperate with autofs 5.x without the need for any workarounds or awkward configuration.

Afterthoughts

It’s been a long time in coming, but I think it was well worth the wait. I’m very proud to have been part of this product release, and I’m thrilled with the work my team has put into it.

Accelerator 5.4 is available immediately for current customers. New customers should contact sales@electric-cloud.com.

Makefile hacks: print the value of any variable

One of my favorite makefile debugging tricks is this rule for printing out the value of a variable:

print-%:
        @echo '$*=$($*)'

Throw this into a GNU make makefile and then print any make variable you like by invoking targets like print-MAKE_VERSION:

ericm@chester:/tmp$ gmake print-MAKE_VERSION
MAKE_VERSION=3.81

You can imagine how handy this is when diagnosing issues with your makefiles. Here’s how it works:

  1. print-% defines a pattern rule that matches any target that starts with the characters print-.
  2. In the context of a pattern rule, the $* variable expands to the stem of the target, that part which matched the % in the pattern. In my example above, that corresponds to MAKE_VERSION.
  3. GNU make variable expansion rules allow for variable references inside variable names, so $($*) expands first to $(MAKE_VERSION), and finally to the value of the MAKE_VERSION variable.

Makefile injection with -f

The print-% rule is a slick hack, but it’s a nuisance to have to modify a makefile just to use it. Worse, you might not even be able to modify the makefile. Fortunately, there’s a solution: the -f command-line option. You’re probably familiar with it — that’s how you tell gmake to use a different makefile than the default Makefile when it starts. For example, if you have a makefile named build.mak:

gmake -f build.mak

What you may not know is that you can use multiple -f options on the command line. GNU make will read each file in turn, incorporating the contents of each just as if they were included with the include directive. We can create a simple makefile called printvar.mak containing nothing but our print-% rule, then inject it into any makefile we want like this:

gmake -f printvar.mak -f Makefile print-MAKE_VERSION

A shell script to save typing

The combination of the print-% rule and the -f command-line option is powerful, but it’s unwieldy — too many characters to type. The solution is a shell script wrapper:

#!/bin/bash

filename=""
if [ -f GNUmakefile ] ; then
  filename="GNUmakefile"
elif [ -f makefile ] ; then
  filename="makefile"
elif [ -f Makefile ] ; then
  filename="Makefile"
fi
if [ -n "$filename" ] ; then
  vars=""
  for n in $@ ; do
    vars="$vars print-$n"
  done
  gmake -f $filename -f printvar.mak $vars
else
  echo "No makefile found" 1>&2
  exit 1
fi

Save that in a file called printvars somewhere on your PATH and you can do things like this:

ericm@chester:/tmp$ printvars MAKE_VERSION COMPILE.cc
MAKE_VERSION=3.81
COMPILE.cc=g++    -c

Advanced make variable diagnostics

Beyond simply printing the value of a variable, GNU make 3.81 has three built-in functions that allow introspection on variables, which you can add to the print-% rule for additional diagnostics.

First is the $(origin) function, which tells you how a variable was defined. For example, if a variable FOO was inherited from the environment, $(origin FOO) will give the result environment. Variables defined in a makefile will give the result file, and so forth.

Next is the $(flavor) function, which tells you the flavor of the variable, either simple or recursive.

Finally is the $(value) function, which gives you the unexpanded value of the variable. For example, if you have variables like this:

FOO=123
BAR=$(FOO)

$(value BAR) will give the result $(FOO), rather than the fully-expanded 123 that you might expect.

With these additions, the print-% rule now looks like this:

print-%:
	@echo '$*=$($*)'
	@echo '  origin = $(origin $*)'
	@echo '  flavor = $(flavor $*)'
	@echo '   value = $(value  $*)'

And here’s how it looks in action:

ericm@chester:/tmp$ printvars MAKE_VERSION COMPILE.cc
MAKE_VERSION=3.81
  origin = default
  flavor = simple
   value = 3.81
COMPILE.cc=g++    -c
  origin = default
  flavor = recursive
   value = $(CXX) $(CXXFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c

Make syntax is the worst… except for all the alternatives

My series of comparisons between SCons and GNU make sparked a lot of discussion, not just about SCons and gmake, but about many other build tools. That was to expected, but what surprised me was several comments specifically criticizing the syntax of make — the semicolons, colons, ats and dollars that we all know so well. One reader actually said that make syntax has a 1970’s feel, as if the age of the language is somehow an indicator of unsuitability for the task. Then my friend John Graham-Cumming posted an article in defense of make syntax, and I figured I would add my thoughts.

Make syntax is the worst… except for all the alternatives

Criticisms of make syntax strike me as a bit absurd. Take a look around the build tool space: you’ll see that many of these “improved” tools use syntax that ranges from “pretty much the same” to “ridiculously verbose”. Let’s look at the syntax used for the two core functions of a build system: specifying the graph of depdencies between files, and specifying the commands to generate a file from a set of inputs.

Dependency graph syntax

The syntax for describing the relationship between input and output files in make is concise, if oblique:

foo: foo.in

To me this is elegant in its simplicity. You may argue that the choice of a colon is arbitrary, and you’d be right — but then, what would be significantly better? I would say there is nothing that is better, but plenty of things worse. For comparison, look at the same relationship, expressed in the syntax of some other build tools:

CMake
add_custom_command(
    OUTPUT foo
    COMMAND update -o foo foo.in
    DEPENDS foo.in)
Cook
foo: foo.in ;
Jam
MyCompile foo : foo.in ;
Rake
file foo => [ 'foo.in' ]
SCons
env.MyCompile('foo', 'foo.in')
tup
foo.in |> update -o %o %f |> foo
Waf
bld(
    rule     = 'update -o ${TGT} ${SRC}
    source   = 'foo.in',
    target   = 'foo')

Some of these, like Cook and Jam, are nearly identical to make. Others, like Waf, are certainly more verbose, but not obviously better. That verbosity may seem great when there’s only a handful of targets, but with hundreds of targets, it will be an irritation.

The truth is that there just isn’t any particular syntax that naturally lends itself to expressing a dependency graph. The reason make syntax hasn’t changed in over 30 years is because target: prereq works, and it’s just as good as anything else you might choose.

Command syntax

True to form, the syntax for specifying the commands to run to generate a file in make is just as terse:

update -o $@ $^

This minimalist syntax naturally puts the emphasis on the important stuff: the command to run and its flags. Here’s the same command in some other build tools (nota bene: some of these are the same as what’s shown above; in those cases I could not easily determine a syntax for specifying dependencies separately from commands, or whether that is even possible with that tool):

CMake
add_custom_command (
  OUTPUT foo
  COMMAND update -o foo foo.in
  DEPENDS foo.in
)
Cook
{
     update -o [target] [need];
}
Jam
rule MyRule
{
    MyCompile $(1) $(2) ;
}
actions MyCompile
{
    update -o $(1) $(2)
}
Rake
sh "update -o #{t.name} #{t.prerequisites.join(' ')}"
SCons
env.Append(BUILDERS =
  {'MyCompile': Builder(action = 'update -o $TARGET $SOURCE', 
    src_suffix='.in')})
tup
foo.in |> update -o %o %f |> foo
Waf
bld(
    rule     = 'update -o ${TGT} ${SRC}
    source   = 'foo.in',
    target   = 'foo')

Most of these are more verbose than make, and for me the extra text just makes it harder to see what’s really going on. The SCons example is particularly ugly: 6 times the characters to express the same simple command!

Did you mean TAB instead of 8 spaces?

I suspect that at the heart of complaints about make syntax is a single unfortunate confluence of facts. First, make uses a literal TAB character to mark the beginning of a command in a recipe. Second, most code editors automatically replace TAB with spaces. Together these facts conspire to confound even the most experienced makefile writer, resulting in this slightly condescending, always irritating error message:

*** missing separator (did you mean TAB instead of 8 spaces?)

.

I won’t argue with you, this is a real nuisance. But there’s good news: GNU make 3.82 introduced a new special variable called .RECIPEPREFIX. Set this variable to any character you like, and GNU make will use that instead of TAB to mark commands in the makefile. For example:

.RECIPEPREFIX=! all: !@echo Who says make syntax is bad?

Conclusion

Don’t get me wrong: as with any tool, there is room for improvement in make. I agree with John’s suggestion to optionally include command-lines and input file checksums in the up-to-date decisions (some of that is available now in ElectricAccelerator). Beyond that I think it would be great to add support for non-pattern rules with multiple outputs — there’s no way to do that now, although there are a variety of hacks to emulate non-pattern rules with multiple outputs. The interesting thing about these ideas is that all of them can be added to make, without requiring the creation of a completely new build tool.

Yes, make syntax is terse, but the lack of extraneous noise makes it easier to see what’s going on in a makefile than in a comparable build file from another tool. Likewise, make syntax is old, but rather than being a weakness, I see that as a testament to it’s fitness. Surely it’s telling that in 30 years, nothing else has come along that is obviously better, or sufficiently better to justify the cost of migration.

Shell commands in GNU make

For new users, the relationship between make and the shell can be confusing. I think people get thrown off by the make-specific syntax in makefiles — all those colons and at signs and percents. But the truth is that most of the content in a makefile is commands that are executed by the shell.

With GNU make, there are two ways to invoke shell commands from a makefile:

Recipes: the go-to-guy of shell commands in make

The recipe of a rule is the workhorse of gmake/shell integration. Structurally, the recipe is the list of commands used to generate the output file — each of the tab-initiated lines following a target: prereq declaration in the makefile. In the following makefile fragment, I highlighted the recipe in red (note that the commands shown here are for illustration only; in a real makefile, you should use variables like $(CC), $@ and $< to ensure the makefile is portable and flexible):

foo.o: foo.c foo.h
echo 'building foo.o' gcc -c -o foo.o foo.c echo 'done with foo.o'

You can think of the commands in a recipe as being invoked using the common idiom sh -c “command”. That means that you can use standard shell constructs like process pipelines and for loops. In turn, that flexibility means that recipes should be your “go-to guy” when it comes to invoking shell commands from a makefile. Want to preprocess your sources with sed before sending it to the compiler? Just tweak your recipe:

foo.o: foo.c foo.h echo 'building foo.o' sed -e 's/foo/bar/g' foo.c | gcc -c -o foo.o -xc - echo 'done with foo.o'

So recipes are the primary way to invoke shell commands in make. Here are some guidelines to remember:

If possible, gmake will invoke commands directly rather using the shell.

Essentially, gmake scans the command-line for shell built-ins (like for and if) and “shell special characters” (like | and &). If none of these are present in the command-line, gmake will
avoid the overhead of the shell invocation by invoking the command directly (literally just using execve to run the command).

Note that if you change the shell gmake uses by setting the SHELL makefile variable, then gmake will always use the shell to invoke commands, since it can’t know what commands and characters are “special” to your custom shell.

gmake expands command-lines before executing them.

Command expansion is why you can use gmake features like variables (eg, $@) and functions (eg, $(foreach)) in the recipe. It is also why you must use double dollar signs if you want to reference shell variables in your recipe:

abc: def let foo=1 ; echo $$foo

gmake executes each line in a recipe separately.

That means that there’s no sharing of state from one command to the next, and it’s why recipes like the following don’t work as expected:

abc: def
let foo=1 echo $$foo

Because this recipe contains two lines, gmake executes it in two pieces:

sh -c "let foo=1" sh -c "echo $foo"

It’s obvious why this recipe doesn’t work, when written out that way: the variable assignment occurs in one shell, but the reference occurs in another. But there’s an easy way around this: line continuations. You’ve probably seen this technique in use:

abc: def
let foo=1 ; \ echo $$foo

Now gmake executes the recipe using a single shell invocation:

sh -c "let foo=1 ; echo $foo"

Nota bene: if you are using gmake 3.82 or later, you can enable the .ONESHELL feature, which causes gmake to invoke the entire recipe using a single shell invocation, even if you haven’t used line continuations.

The $(shell) function

The $(shell) function is the second way to invoke the shell from gmake. It’s intended purpose is to capture the output of a command into a gmake variable. For example, you could save the name of the current user in the variable USERNAME this way:

USERNAME := $(shell whoami)

$(shell) takes a single argument, the command to run. Just like commands in recipes, if there are shell constructs in the command gmake will invoke it using sh -c “command”; otherwise, gmake will invoke the command directly. Likewise, gmake will expand variable and function references in the command before invoking it, so you must use double-dollar-signs to reference shell variables in that context:

TARGETS := $(shell for n in `seq -w 1 10`; do echo $$n; done)

Here are some guidelines to help you use $(shell) correctly and effectively:

If you’re not capturing the result of $(shell) to a variable, you’re probably misusing $(shell).

Here’s a real-world example of how not to use $(shell):

$(shell touch targets.mk) include targets.mk all: $(TARGETS)

The intent here was to ensure that targets.mk exists before gmake tries to include it. One problem with this approach is that it will touch the file every time you invoke the makefile, even on a “no touch” build! The correct way to accomplish this is with a proper rule for targets.mk:

include targets.mk all: $(TARGETS) targets.mk: @touch targets.mk

If targets.mk doesn’t exist, it will be created. Note that this particular example exploits the makefile remaking feature in gmake; in general though, if you’re using $(shell) this way, you can probably transform that usage into a regular rule, and get better performance and a more robust makefile for your trouble.

If you’re using $(shell) in a recipe, you’re probably misusing $(shell).

Another real-world example of $(shell) abuse:

foo.o: foo.c $(shell sed -e 's/foo/bar/' $< | gcc -o $@ -xc -)

Now that you know that recipes are implicitly using the shell, you can see that this use of $(shell) is utterly superfluous. The problem with this is that it moves the work into the command expansion phase, which means it can’t run in parallel with gmake. The fix for this one is to just drop the $(shell) call:

foo.o: foo.c sed -e 's/foo/bar/' $< | gcc -o $@ -xc -

Always use := assignment with $(shell).

I’ve written before about the importance of using := assignment with $(shell). In short: not using := assignment can cause your makefile to invoke the shell far more often than you realize, which can be a performance problem, and leave you with unpredictable build results. Always use := assignment with $(shell).

Conclusion

Hopefully now you see that the relationship between gmake and the shell is not so mysterious after all. Just remember: when in doubt, use a recipe, and don’t use $(shell) unless you’re capturing the result into a variable.

Faster builds through smarter scheduling: longest job first

One idea that comes up now and then is to speed up parallel builds by being smarter about the order used to run the jobs in the build. Obviously we don’t have complete control over the order — we have to respect the dependencies, of course — but at any given point there are probably multiple jobs ready-to-run. All things being equal, the build ought to finish sooner if we start the longest jobs first. But does it really work out that way?

A compelling example

Here’s an example that a user posted on the GNU make mailing list:

all: A B E A: ; # 3-minute job. B: C D ; # 1-minute job. C D: ; # 1-minute job. E: ; # 6-minute job.

Here’s the dependency graph for this simple makefile. The numbers in parenthesis indicate the serial order of the jobs in the build — the order the jobs will execute if the build runs serially:

Dependency graph for a simple build, with serial order marked

The serial order is also the order that make will start the jobs when running in parallel. For example, if we run this makefile with gmake -j 2, we would see the execution proceed as follows:

  • 0 minutes: gmake start jobs A and C.
  • 1 minute: C completes and gmake starts D (two minutes left on job A).
  • 2 minutes: D completes and gmake starts B (one minute left on job A).
  • 3 minutes: A and B complete, and gmake starts E.
  • 9 minutes: E completes, and the build ends.

Visually, it looks like this:

But this ordering is obviously quite inefficient. Job E is not dependent on any other jobs, so there’s no reason we can’t start it sooner. In fact, if we start E first, then the execution looks like this:

By starting the longest jobs first, we can trim the overall build time by an impressive 30%! So: obviously we can fabricate a build that shows significant improvement by use of a longest-job-first scheduler. But will we see similar results from real builds?

Look before you leap

To answer this question, I made a build simulator that simulates running a build using longest-job-first scheduling. The simulator uses job duration and dependency information from an annotation file generated during a real build with ElectricAccelerator, a high-performance gmake replacement.

I tested the longest-job-first scheduler on several builds:

  • MySQL
  • Samba
  • Mozilla

To my disappointment, the new scheduler showed no significant benefit on these builds:


Build times in real builds are virtually unchanged with the longest-job-first scheduler. On some of these graphs you can barely even tell there are two distinct lines!

What went wrong?

I think there are two factors that explain this lackluster result. The first is homogeneity: in most builds, the majority of the jobs are more-or-less the same length. For example, 90% of the jobs in the Mozilla build are less than 0.25s long. 80% of the jobs in the Samba build are in the 2.5s to 5.0s range. What difference does it make to choose job A or job B when they both have nearly identical durations? Now, maybe you’re thinking “A-ha! What about the link job? That’s definitely longer than the other jobs!” And it’s true, the link job often is longer. But by its nature it can’t start any sooner than after all the other jobs have finished anyway — so again the longest-job-first scheduler has no choice to make, because there is only one job to choose.

The second factor is the longest-job-first scheduler is smarter, but not smart enough: by considering only the length of each job in isolation, the longest-job-first scheduler cannot account for situations where a short job blocks a very long job. For example, suppose that we change our original example by adding a prereq for job E:

all: A B E A: ; # 3-minute job. B: C D ; # 1-minute job. C D: ; # 1-minute job. E: F ; # 6-minute job. F: ; # 10-second job.

Because job F is so short, the longest-job-first scheduler will never prioritize it over jobs A, B, C, or D — in fact, the scheduler won’t run job F until it is literally the only choice left. Unfortunately, that means that job E won’t run until the end of the build, clobbering our overall build time.

Where do we go from here?

From these simulations, it’s clear that there’s little point in pursuing a simple longest-job-first scheduler. But I think there may be something to the idea of a scheduler that considers not just individual job lengths but the relationships between jobs. I’ll explore that possibility in a future post.

How long are the jobs in my build? part 2

In response to my post about visualizing the lengths of the jobs in a build, one reader suggested a few tweaks to my gnuplot script to make the graph a proper surface plot. I like the look of this:

This version addresses some of the short-comings of my original:

  • It’s easier to determine the z-coordinate of a given point. In the original that was nearly impossible. It’s still a little tricky here because of the perspective, but it’s a step in the right direction.
  • Lower layers are not obscured. Originally, a dense layer of points could obscure points with a lower z-value. This version avoids that problem because you can see places where the surface dips.

Unfortunately, this version introduces some new problems:

  • Raw data points are averaged. In order to produce this surface plot, gnuplot computes a weighted average of the data points. Averaging itself is not necessarily a problem. The trouble here is that the layout of the data points is completely arbitrary, as you may recall from the previous post. That means that this plot effectively picks a handful of random data points, averages them, and plots the result. We still see the general trend — that most of the jobs are about the same length — but it feels a bit phony.
  • Implies patterns where there are none. When I first saw this image, I was struck by the “mountain range” running across the plot, a bit left of center. I hadn’t seen that in my original graph, so naturally I was intrigued. I spent hours trying to understand why that feature might be present, and finally came to this conclusion: it isn’t real. It’s just an artifact of the graphing method. Remember, the layout of the points is completely arbitrary, so it would be quite odd for there to really be a pattern like this cutting across the plot. In fact, I found that similar “features” appeared no matter what dimensions I used for the plot. I think the reason is that in this mode, gnuplot is not plotting the raw data, but rather a weighted average of adjacent points. This will tend to introduce relationships between those points that are not actually real.

OK, so this revised version is definitely interesting. I’m not sure that it’s better necessarily, given the defects I mentioned above. And unfortunately it doesn’t help at all with the issue of making something useful out of the X/Y coordinates. Nevertheless, thanks Aaron for the suggestion!

How long are the jobs in my build?

I’ve been playing with a new visualization for build data. I was looking for a way to really hammer home the point that in most builds, the vast majority of jobs are more-or-less the same length. The “Job Count by Length” report in ElectricInsight does the same thing, but in a “just the facts” manner. I wanted something that would be more visceral.

Then I struck on the idea of mapping the jobs onto a surface plot, using the job duration as the z-coordinate or “height”, so longer jobs have points high above the z-axis. In such a view, we would expect to see a mostly flat plain, with a small portion of points above the plain. Sure enough, that’s just what we get. Here’s an example, generated using data from a mozilla build:

Here’s what I like about this visualization:

  • Nails the primary goal. This visualization is great at demonstrating that most jobs in the build have about the same duration.
  • It’s looks cool. Given a choice between two visualizations that show the same data, the one that looks cooler definitely has an advantage.

Now, here’s what I don’t like about this visualization:

  • X- and Y-coordinates are arbitrary. For this prototype I just determined the smallest box large enough to show all the jobs in the build, then plotted the first job at 0,0; the second at 0,1, etc. This is simple, and it gives a compact display, but it would be nice if the X- and Y-coordinates had some actual meaning.
  • It’s hard to tell what Z-coordinate any given point has. For example, I can easily see that the vast majority of jobs have roughly the same duration, but what duration is that? 0 seconds? 1 second? 1/2 second?
  • A dense upper layer obscures lower layers. Although this build is unimodal, suppose it was instead bimodal — the density of points at height 5 might obscure the existence of points at height 3.

For comparison, here’s the “Job count by Length” report from ElectricInsight. It uses the same data, and tells the same story, but it’s not nearly as visually dramatic:

So, what do you think? Any ideas how I could use the X- and Y-coordinates to convey useful information? Keep reading if you want to see how I made this visualization.
(more…)