Makefile hacks: automatically split long command lines

If you’ve worked on a large build system you’ve probably bumped into this error, or one like this:

gmake: execvp: /bin/sh: Argument list too long

This error means the length of some command-line in your makefile has grown past the system limit, which is typically in the 32 to 256 kilobyte range. It’s surprisingly easy to hit that limit. You start with a small list of object files to be linked together. Over time you add more, and the command-line gets a little longer. Add a few more and it gets longer still. Before you know it you have a monster command-line and your build starts failing.

The solution to this problem is simple: split the long command-line into several shorter command-lines. For example, ar r libraries/lib.a objects/foo.o objects/bar.o objects/baz.o objects/boo.o objects/bang.o becomes something like this:

ar r libraries/lib.a objects/foo.o objects/bar.o
ar r libraries/lib.a objects/baz.o objects/boo.o
ar r libraries/lib.a objects/bang.o

Simple in theory, but tedious to do by hand. And doing it manually is like putting a ticking time-bomb into your makefile — it’s only a matter of time before your build grows enough that you have to go through this exercise again.

I recently ran across a clever solution that exploits the $(eval) function in GNU make to split long command-lines automatically, eliminating the tedium and the time-bomb. After I show you the solution, I’ll explain it piece-by-piece.

The max_args function

The solution is a user-defined function called max_args that splits long command-lines into equal-length chunks:

1
2
3
4
5
6
7
8
9
define max_args
$(eval _args:=)
$(foreach obj,$3,$(eval _args+=$(obj))$(if $(word $2,$(_args)),$1$(_args)$(EOL)$(eval _args:=)))
$(if $(_args),$1$(_args))
endef
define EOL
endef

And an example of its use:

1
2
3
OBJS:=a b c d e f g h
all:
@$(call max_args,echo,2,$(OBJS))

The max_args function takes three parameters: the base command-line, the number of arguments per “chunk”, and the complete list of arguments. It expands to a series of command-lines — one for each chunk of arguments.

The trick behind max_args is the use of $(eval) to update a variable as a side-effect of gmake’s regular variable expansion activity. If you’re not familiar with gmake variable expansion, here’s a quick rundown: when gmake finds a variable or function reference, like $(something), it replace the entire reference with an expanded value. In the case of a variable that’s just the value of the variable. Most variables in gmake are recursive which means that if the variable value itself contains embedded variable references, those will be expanded as well, recursively. In the case of a function, gmake evaluates the function, and replaces the reference with the computed value.

The meat of max_args is on line 3. It starts with the $(foreach) function, which evaluates its third argument, the body of the loop, once for each word in its second argument — in this case, the list of objects passed in the call to max_args.

In max_args, the loop body has two components. The first is a call to $(eval), which simply appends the current value of the loop variable to an accumulator called _args.

The second component of the loop body uses $(if) and $(word) to check the length of _args. The $(word) function returns the nth word from a list, or an empty string if there are fewer than n words in the list. The $(if) function expands its second argument (the then clause) only if its first argument (the condition) expands to a non-empty string, so together these functions check if _args has the desired number of words, and if so the then clause of the $(if) is expanded.

The then clause of this $(if) has two components. The first constructs a completed command-line by concatenating the base command-line, here given by $1, the first argument to the original max_args call; the accumulated arguments; and a newline character. Thanks to the rules of gmake expansion, this command-line is added to the overall expansion result for the max_args function. The second part of the then clause uses $(eval) to reset the accumulator

If the chunk size does not evenly divide the number of arguments, the stragglers are emitted in a final command-line on the last line of max_args.

Limitations

max_args is handy but it has one significant limitation: command-line length limits are based on the number of bytes in the command-line, not the number of words, in it. Unfortunately, gmake has no built-in way to count the number of characters in a string. gmake does provide the $(words) built-in, so that’s what max_args uses. That just means that to use it effectively you have to take a guess at the number of arguments that will fit in a single command-line, for example by dividing the length limit by the average number of characters in each argument, then subtracting some to allow some buffer for outliers.

How ElectricMake guarantees reliable parallel builds

Parallel execution is a popular technique for reducing software build length, and for good reason. These days, multi-core computers have become standard — even my laptop has four cores — so there’s horsepower to spare. And it’s “falling over easy” to implement: just slap a “-j” onto your make command-line, sit back and enjoy the benefits of a build that’s 2, 3 or 4 times faster than it used to be. Sounds great!

But then, inevitably, invariably, you run into parallel build problems: incomplete dependencies in your makefiles, tools that don’t adequately uniquify their temp file names, and any of a host of other things that introduce race conditions into your parallel build. Sometimes everything works great, and you get a nice, fast, correct build. Other times, your build blows up in spectacular fashion. And then there are the builds that appear to succeed, but in fact generate bogus outputs, because some command ran too early and used files generated in a previous instead of the current build.

This is precisely the problem ElectricMake was created to solve — it gives you fast, reliable parallel builds, regardless of how (im)perfect your makefiles and tools are. If the build works serially, it will work with ElectricMake, but faster. If you’ve worked with parallel builds for any length of time, you can probably appreciate the benefit of that guarantee.

But maybe you haven’t had much experience with parallel builds yourself, or maybe you have but like many people, you don’t believe this problem can actually be solved. In that case, perhaps some data will persuade you. Here’s a sample of open source projects that don’t build reliably in parallel using gmake:

For each, I did several trials with gmake at various levels of parallelism, to determine how frequently the parallel build fails. Then, I did the same build several times with emake and again measured the success rate. Here you can see the classic problem of parallel builds with gmake — works great at low levels of parallelism (or serially, the “degenerate” case of parallel!), but as you ratchet up the parallelism, the build gets less and less reliable. At the same time, you can see that emake is rock solid regardless of how much parallelism you use:

Parallel build success rates

The prize for this reliability? Faster builds, because you can safely exploit more parallelism. Where gmake becomes unreliable with -j 3 or -j 4, emake is reliable with any number of parallel jobs.

How ElectricMake guarantees reliable parallel builds

The technology that enables emake to ensure reliable parallel builds is called conflict detection. Although there are many nuances to its implementation, the concept is simple. First, track every modification to every file accessed by the build as a distinct version of the file. Then, for each job run during the build, track the files used and verify that the job accessed the same versions it would have had the build run serially. Any mismatch is considered a conflict. The offending job is discarded along with any filesystem modifications it made, and the job is rerun to obtain the correct result.

The versioned file system

At the heart of the conflict detection system is a data structure known as the versioned file system, in which emake records every version of every file used over the lifetime of the build. A version is added to the data structure every time a file is modified, whether that be a change to the content of the file, a change in the attributes (like ownership or access permissions), or the deletion of the file. In addition to recording file state, a version records the job which created it. For example, here’s what the version chain looks like for a file “foo” which initially does not exist, then is created by job A with contents “abc”, deleted by job C, and recreated by job E with contents “123”:

Jobs

Jobs are the basic unit of work in emake. A job represents all the commands that must be run in order to build a single makefile target. In addition, every job has a serial order — the order in which the job would have run, had the build been run serially. The serial order of a job is dictated by the dependencies and structure of the makefiles that make up the build. Note that for a given build, the serial order is deterministic and unambiguous — even if the dependencies are incomplete, there is exactly one order for the jobs when the build is run serially.

With the serial order for every job in hand, deciding which file version should be used by a given job is simple: just find the version created by the job with the greatest serial order that precedes the job accessing the file. For example, using the version chain above (and assuming that the jobs’ names reflect their serial order), job B should use the version created by job A, while job D should see the file as non-existent, thanks to the version created by job C.

A job enters the completed state once all of its commands have been executed. At that point, any filesystem updates created by the job are integrated into the versioned filesystem, but, critically, they are not pushed to the real filesystem — that gives emake the ability to discard the updates if the job is later found to have conflicts.

Each job runs against a virtual filesystem called the Electric File System (EFS), rather than the real filesystem. The EFS serves several important functions: first, it is the means by which emake tracks file accesses. Second, it enables commands in the build to access file versions that exist in the versioned filesystem, but not yet on the real filesystem. Finally, it isolates simultaneously running jobs from one another, eliminating the possibility of crosstalk between commands.

Detecting conflicts

With all the data emake collects — every version of every file, and the relationship between every job — the actual conflict check is simple: for each file accessed by a job, compare the actual version to the serial version. The actual version is the version that was actually used when the job ran; the serial version is the version that would have been used, if the build had been run serially. For example, consider a job B which attempts to access a file foo. At the time that B runs, the version chain for foo looks like this:

Given that state, B will use the initial version of foo — there is no other option. The initial version is therefore the actual version used by job B. Later, job A creates a new version of foo:

Since job A precedes job B in serial order, the version created by job A is the correct serial version for job B. Therefore, job B has a conflict.

If a job is determined to be free of conflicts, the job is committed, meaning any filesystem updates are at last applied to the real filesystem. Any job that has a conflict is reverted — all versions created by the job are marked invalid, so subsequent jobs will not use them. The conflict job is then rerun in order to generate the correct result. The rerun job is committed immediately upon completion.

Conflict checks are carried out by a dedicated thread which inspects each job in strict serial order. That guarantees that a job is not checked for conflicts until after every job that precedes it in serial order has been successfully verified free of conflicts — without this guarantee, we can’t be sure that we know the correct serial version for files accessed by the job. Similarly, this ensures that the rerun job, if any, will use the correct serial versions for all files — so the rerun job is sure to be conflict free.

ElectricMake: reliable parallel builds

Conceptually, conflict detection is simple — keep track of every version of every file used in a build, then verify that each job used the correct version — but there are many details to its implementation. And in this article I’ve only covered the most basic implementation of conflict detection — after many years of experience and thousands of real-world builds we’ve tweaked the implementation, relaxing the simple definition of a conflict in specific cases in order to improve performance.

The benefit of conflict detection is simple too: reliable parallel builds, which in turn means shorter build times, regardless of how imperfect your makefiles are and how parallel-unsafe your toolchain may be.

Why is SCons so slow?

UPDATE: If you’re coming from Why SCons is not slow, you should read my response

A while back, I did a series of posts exploring the performance of SCons on builds of various sizes. The results were dismal: SCons demonstrated a classic O(n2) growth in runtime, meaning that the length of the build grew in proportion to the square of the number of files in the build, rather than linearly as one would hope. Naturally, that investigation and its results provoked a great deal of discussion at the time and since. Typically, SCons advocates fall back on one particular argument: “Sure, SCons may be slow,” they say, “but that’s the price you pay for a correct build.” Recently, Eric S. Raymond wrote an article espousing this same fundamental argument, with the addition of some algorithmic analysis intended to prove mathematically that a correct build, regardless of the build tool, must necessarily exhibit O(n2) behavior — a clever bit of circular logic, because it implies that any build tool that does not have such abyssmal performance must not produce correct builds!

Naturally, after spending nearly a decade developing a high-performance replacement for GNU make, I couldn’t let that statement stand. This post is probably going to be on the long side, so here’s the tl;dr summary:

  • You can guarantee correct builds with make, provided you follow best practices.
  • The worst-case runtime of any build tool if, of course, O(n2), but most, if not all, builds can be handled in O(n) time, without sacrificing correctness.
  • SCons’ performance problem is caused by design and implementation decisions in SCons, not some pathology of build structure.

What is required to ensure a correct build?

One of the fundamental tenents of the pro-SCons mythos is the idea that it is unique in its ability to guarantee correct builds. In reality, SCons is not doing anything particularly special in this regard. It’s true that by virtue of its design SCons makes it easier to get it right, but there’s nothing keeping you from enjoying the same assurances in make.

First: what is a correct build? Simply put, a correct build is one in which everything that ought to be built, is built. Note that by definition, a from-scratch build is correct, since everything is built in that case. So the question of “correct” or “incorrect” is really only relevant in regards to incremental builds.

So, what do we need in order to ensure a correct incremental build? Only three things, actually:

  1. A single, full-build dependency graph.
  2. Complete dependency information for every generated file.
  3. A reliable way to determine if a file is up-to-date relative to its inputs.

What SCons has done is made it more-or-less impossible, by design, to not have these three things. There is no concept like recursive make in the SCons world, so the only option is a single, full-build dependency graph. Likewise, SCons automatically scans input files in several programming languages to find dependency information. Finally, SCons uses MD5 checksums for the up-to-date check, which is a pretty darn reliable way to verify whether a given file needs to be rebuilt.

But the truth is, you can guarantee correct builds with make — you just have to adhere to long-standing best practices for make. First, you have to avoid using recursive make. Then, you need to add automatic dependency generation. The only thing that’s a little tricky is the up-to-date check: make is hardwired to use file timestamps, which can be spoofed, deliberately or accidentally — although to be fair, in most cases, timestamps are perfectly adequate. But even here, there’s a way out. You can use a smarter version of make that has a more sophisticated up-to-date mechanism, like ElectricMake or ClearMake. You can even shoehorn MD5 checksums into GNU make, if you like.

I can’t deny that SCons has made it easier to get correct builds. But the notion that it can’t be done with make is simply absurd.

What is the cost of a correct build?

Now we turn to the question of the cost of ensuring correctness. At its core, any build tool is just a collection of graph algorithms — first contructing the dependency graph, then traversing it to find and update out-of-date files. These algorithms have well-understood complexity, typically given as O(n + e), where n is the number of nodes in the graph, and e is the number of edges. It turns out that e is actually the dominant factor here, since it is at least equal to n, and at worst as much as n2. That means we can simplify the complexity to O(n + n2), or just O(n2).

Does this absolve SCons of its performance sins? Unfortunately it does not, because O(n2) is a worst-case bound — you should only expect O(n2) behavior if you’ve got a build that has dependencies between every pair of files. Think about that for a second. A dependency between every. pair. of. files. Here’s what that would look like in makefile syntax:

all: foo bar foo.c bar.c foo.h bar.h
foo:     bar foo.c bar.c foo.h bar.h
bar:         foo.c bar.c foo.h bar.h
foo.c:             bar.c foo.h bar.h
bar.c:                   foo.h bar.h
foo.h:                         bar.h

It’s ridiculous, right? I don’t know about you, but I’ve certainly never seen a build that does anything even remotely like that. In particular, the builds I used in my benchmarks don’t look like that. Fortunately, those builds are small and simple enough that we can directly count the number of edges in the dependency graph. For example, the smallest build in my tests consisted of:

2,000 C sources
+ 2,004 headers
+ 2,000 objects
+ 101 libraries
+ 100 executables

6,205 total files

So we have about 6,000 nodes in the graph, but how many edges does the graph contain? Lucky for us, SCons will print the complete dependency graph if we invoke it with scons –tree=all:

+-.
  +-SConstruct
  +-d1_0
  | +-d1_0/SConstruct
  | +-d1_0/f00000_sconsbld_d1_0
  | | +-d1_0/f00000_sconsbld_d1_0.o
  | | | +-d1_0/f00000_sconsbld_d1_0.c
  | | | +-d1_0/lup001_sconsbld_d1_0/f00000_sconsbld_d1_0.h
  ...

The raw listing contains about 35,000 lines of text, but that includes duplicates and non-dependency information like filesystem structure. Filter that stuff out and you can see the graph contains only about 12,000 dependencies. That’s a far cry from the 1,800,000 or so you would expect if this truly were a “worst-case” build. It’s clear, in fact, that the number of edges is best described as O(n).

Although I don’t know how (or even if it’s possible) to prove that this is the general case, it does make a certain intuitive sense: far from being strongly-connected, most of the nodes in a build dependency graph have just one or two edges. Each C source file, for example, has just one outgoing edge, to the object file generated from that source. Each object file has just one outgoing edge too, to the library or executable the object is part of. Sure, libraries and headers probably have more edges, since they are used by multiple executables or objects, but the majority of the stuff in the graph is going to fall into the “small handful of edges” category.

Now, here’s the $64,000 question: if the algorithms in a build tool scale in proportion to the number of edges in the dependency graph, and we’ve just shown that the dependency graph in question has O(n) edges, why does SCons use O(n2) time to execute the build?

Why is SCons so slow?

SCons’ O(n2) performance stems from its graph traversal implementation. Essentially, SCons scans the entire dependency graph each time it is looking for a file to update. n scans of a graph with O(n) nodes and edges equals an O(n2) graph traversal. There’s no mystery here. In fact, the SCons developers are clearly aware of this deficiency, as described on their wiki:

It’s worth noting that the Jobs module calls the Taskmaster once for each node to be processed (i.e., it’s O(n)) and the Taskmaster has an amortized performance of O(n) each time it’s called. Thus, the overall time is O(n^2).

But despite recognizing this flaw, they severely misjudged its impact, because they go on to state that it requires a “pathological” dependency graph in order to elicit this worst-case behavior from SCons. As we’ve shown here and in previous posts, even a terribly mundane dependency graph elicits O(n2) behavior from SCons. I shudder to think what SCons would do with a truly pathological dependency graph!

Obviously the next question is: why does SCons do this? That’s not quite as easy for me to explain, as an outside observer. To the best of my understanding, they rescan the graph just in case new dependencies are added to the dependency graph while evaluating a node in the graph — remember, in SCons the commands to update a file are expressed in Python, so they can easily manipulate the dependency graph even while the build is running.

Is it really necessary to rescan the dependency graph over and over? I don’t think so. In fact, make is proof that it is not necessary. I think there are two ways that SCons could address this problem: first, it could adopt GNU make’s convention of partitioning the build into distinct phases, one that updates dependency information, and a second that actually executes the build. In GNU make, that strategy allows for the introduction of new dependency information, while imposing only a one-time O(n) cost for restarting the make process if any new dependencies are found.

Alternatively, SCons could probably be made smarter about when a full rescan is required. Most of the time, even if new dependencies are added to the graph, they are added to the node being evaluated, not to nodes that were already visited. That is, when you scan a source file for implicit dependencies, you find the dependencies for that file not for other files in the build (duh). So most of the time, a full rescan is massive overkill.

The final word…?

Hopefully this is my last post on the subject of SCons performance. It is clear to me that SCons does not scale to large projects, and that the problem stems from design and implementation decisions in SCons, rather than some pathology in the build itself. You can get comparable guarantees of correctness from make, if you’re willing to invest the time to do things the right way. The payoff is a build system that is not only correct but has vastly better performance than SCons as your project grows. Why wouldn’t you want that?

What’s new in ElectricAccelerator 5.4.0

This month, Electric Cloud announced the release of ElectricAccelerator 5.4. This version adds a lot of great new features, including support for GNU Make’s .SECONDEXPANSION feature and the use of $(eval) in rule bodies, and compatibility with Cygwin 1.7.7. In addition to those long-awaited improvements, here are the things that I’m most excited about in this release:

New cluster utilization reports

Accelerator 5.4 includes two new reports designed to give you greater insight into the load on and utilization of your cluster: the Cluster Utilization report and the Sealevel report:

The Cluster Utilization report shows, over the course of a typical day, the average number of builds running and the average combined agent demand from all running builds. The Sealevel report shows the raw agent demand data, plotted over the course of a day. The colored bands correspond to various cluster sizes, including the current cluster size and several hypothetical sizes, so you can see at a glance how large you need to make the cluster in order to satisfy all the agent requests. The percentages on the right side of the graph indicate the portion of agent requests that are left unsatisfied with a cluster of the given size. In the example above, all but 1% of agent requests would be satisfied if the cluster had 40 agents.

Reduced directory creation conflicts

Raise your hand if you’ve ever seen this pattern in a makefile:

%.o: %.c
        @mkdir -p $(dir $@)
        @$(COMPILE.c) -o $@ $<

It’s a common way to ensure the output directory exists before trying to create a file in it. Unfortunately, with a strict application of Accelerator’s conflict detection algorithm, this pattern causes numerous conflicts and poor performance when the build is run without an up-to-date history file. In Accelerator 5.4.0, we improved the algorithm so that this common case is no longer considered a conflict. If you always run with a good history file, this change will not be helpful to you. But sometimes that’s not possible — for example, if you’re building third-party code that’s just gotten a major update — then you’re going to really love this improvement. The Android source code is a perfect example: a from-scratch no-history build of the Gingerbread base used to take 144 minutes. Now it runs in just 22 minutes on the same hardware — 6.5x faster.

New Linux sandbox implementation

The last feature I want to mention here is the new sandbox implementation for Linux. The sandbox is the means by which Accelerator is able to present a different view of the filesystem, from a different point of time during the build, to each of the jobs running concurrently on a given agent host. Without the sandbox, it would be impossible on Linux to simultaneously represent a given file as existent to one job, and non-existent to another.

In previous versions of Accelerator, the Linux sandbox implementation was effective, but ultimately limited in its capabilities. Chief among those limitations was an inability to interoperate with autofs 5.x. There were several workarounds available, but each of those in turn had its own shortcomings.

Accelerator 5.4 uses a different underlying technology to implement the sandbox component: lofs, the loopback filesystem. This is a concept borrowed from Solaris, which has had a vendor-supplied version for years; Linux has nothing that matches the depth of functionality provided by Solaris, so we wrote our own. The net result of this effort is that the limitations of the previous implementation have been entirely eliminated. In particular, Accelerator 5.4 can interoperate with autofs 5.x without the need for any workarounds or awkward configuration.

Afterthoughts

It’s been a long time in coming, but I think it was well worth the wait. I’m very proud to have been part of this product release, and I’m thrilled with the work my team has put into it.

Accelerator 5.4 is available immediately for current customers. New customers should contact sales@electric-cloud.com.

Makefile hacks: print the value of any variable

One of my favorite makefile debugging tricks is this rule for printing out the value of a variable:

print-%:
        @echo '$*=$($*)'

Throw this into a GNU make makefile and then print any make variable you like by invoking targets like print-MAKE_VERSION:

ericm@chester:/tmp$ gmake print-MAKE_VERSION
MAKE_VERSION=3.81

You can imagine how handy this is when diagnosing issues with your makefiles. Here’s how it works:

  1. print-% defines a pattern rule that matches any target that starts with the characters print-.
  2. In the context of a pattern rule, the $* variable expands to the stem of the target, that part which matched the % in the pattern. In my example above, that corresponds to MAKE_VERSION.
  3. GNU make variable expansion rules allow for variable references inside variable names, so $($*) expands first to $(MAKE_VERSION), and finally to the value of the MAKE_VERSION variable.

Makefile injection with -f

The print-% rule is a slick hack, but it’s a nuisance to have to modify a makefile just to use it. Worse, you might not even be able to modify the makefile. Fortunately, there’s a solution: the -f command-line option. You’re probably familiar with it — that’s how you tell gmake to use a different makefile than the default Makefile when it starts. For example, if you have a makefile named build.mak:

gmake -f build.mak

What you may not know is that you can use multiple -f options on the command line. GNU make will read each file in turn, incorporating the contents of each just as if they were included with the include directive. We can create a simple makefile called printvar.mak containing nothing but our print-% rule, then inject it into any makefile we want like this:

gmake -f printvar.mak -f Makefile print-MAKE_VERSION

A shell script to save typing

The combination of the print-% rule and the -f command-line option is powerful, but it’s unwieldy — too many characters to type. The solution is a shell script wrapper:

#!/bin/bash

filename=""
if [ -f GNUmakefile ] ; then
  filename="GNUmakefile"
elif [ -f makefile ] ; then
  filename="makefile"
elif [ -f Makefile ] ; then
  filename="Makefile"
fi
if [ -n "$filename" ] ; then
  vars=""
  for n in $@ ; do
    vars="$vars print-$n"
  done
  gmake -f $filename -f printvar.mak $vars
else
  echo "No makefile found" 1>&2
  exit 1
fi

Save that in a file called printvars somewhere on your PATH and you can do things like this:

ericm@chester:/tmp$ printvars MAKE_VERSION COMPILE.cc
MAKE_VERSION=3.81
COMPILE.cc=g++    -c

Advanced make variable diagnostics

Beyond simply printing the value of a variable, GNU make 3.81 has three built-in functions that allow introspection on variables, which you can add to the print-% rule for additional diagnostics.

First is the $(origin) function, which tells you how a variable was defined. For example, if a variable FOO was inherited from the environment, $(origin FOO) will give the result environment. Variables defined in a makefile will give the result file, and so forth.

Next is the $(flavor) function, which tells you the flavor of the variable, either simple or recursive.

Finally is the $(value) function, which gives you the unexpanded value of the variable. For example, if you have variables like this:

FOO=123
BAR=$(FOO)

$(value BAR) will give the result $(FOO), rather than the fully-expanded 123 that you might expect.

With these additions, the print-% rule now looks like this:

print-%:
	@echo '$*=$($*)'
	@echo '  origin = $(origin $*)'
	@echo '  flavor = $(flavor $*)'
	@echo '   value = $(value  $*)'

And here’s how it looks in action:

ericm@chester:/tmp$ printvars MAKE_VERSION COMPILE.cc
MAKE_VERSION=3.81
  origin = default
  flavor = simple
   value = 3.81
COMPILE.cc=g++    -c
  origin = default
  flavor = recursive
   value = $(CXX) $(CXXFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c

Make syntax is the worst… except for all the alternatives

My series of comparisons between SCons and GNU make sparked a lot of discussion, not just about SCons and gmake, but about many other build tools. That was to expected, but what surprised me was several comments specifically criticizing the syntax of make — the semicolons, colons, ats and dollars that we all know so well. One reader actually said that make syntax has a 1970’s feel, as if the age of the language is somehow an indicator of unsuitability for the task. Then my friend John Graham-Cumming posted an article in defense of make syntax, and I figured I would add my thoughts.

Make syntax is the worst… except for all the alternatives

Criticisms of make syntax strike me as a bit absurd. Take a look around the build tool space: you’ll see that many of these “improved” tools use syntax that ranges from “pretty much the same” to “ridiculously verbose”. Let’s look at the syntax used for the two core functions of a build system: specifying the graph of depdencies between files, and specifying the commands to generate a file from a set of inputs.

Dependency graph syntax

The syntax for describing the relationship between input and output files in make is concise, if oblique:

foo: foo.in

To me this is elegant in its simplicity. You may argue that the choice of a colon is arbitrary, and you’d be right — but then, what would be significantly better? I would say there is nothing that is better, but plenty of things worse. For comparison, look at the same relationship, expressed in the syntax of some other build tools:

CMake
add_custom_command(
    OUTPUT foo
    COMMAND update -o foo foo.in
    DEPENDS foo.in)
Cook
foo: foo.in ;
Jam
MyCompile foo : foo.in ;
Rake
file foo => [ 'foo.in' ]
SCons
env.MyCompile('foo', 'foo.in')
tup
foo.in |> update -o %o %f |> foo
Waf
bld(
    rule     = 'update -o ${TGT} ${SRC}
    source   = 'foo.in',
    target   = 'foo')

Some of these, like Cook and Jam, are nearly identical to make. Others, like Waf, are certainly more verbose, but not obviously better. That verbosity may seem great when there’s only a handful of targets, but with hundreds of targets, it will be an irritation.

The truth is that there just isn’t any particular syntax that naturally lends itself to expressing a dependency graph. The reason make syntax hasn’t changed in over 30 years is because target: prereq works, and it’s just as good as anything else you might choose.

Command syntax

True to form, the syntax for specifying the commands to run to generate a file in make is just as terse:

update -o $@ $^

This minimalist syntax naturally puts the emphasis on the important stuff: the command to run and its flags. Here’s the same command in some other build tools (nota bene: some of these are the same as what’s shown above; in those cases I could not easily determine a syntax for specifying dependencies separately from commands, or whether that is even possible with that tool):

CMake
add_custom_command (
  OUTPUT foo
  COMMAND update -o foo foo.in
  DEPENDS foo.in
)
Cook
{
     update -o [target] [need];
}
Jam
rule MyRule
{
    MyCompile $(1) $(2) ;
}
actions MyCompile
{
    update -o $(1) $(2)
}
Rake
sh "update -o #{t.name} #{t.prerequisites.join(' ')}"
SCons
env.Append(BUILDERS =
  {'MyCompile': Builder(action = 'update -o $TARGET $SOURCE', 
    src_suffix='.in')})
tup
foo.in |> update -o %o %f |> foo
Waf
bld(
    rule     = 'update -o ${TGT} ${SRC}
    source   = 'foo.in',
    target   = 'foo')

Most of these are more verbose than make, and for me the extra text just makes it harder to see what’s really going on. The SCons example is particularly ugly: 6 times the characters to express the same simple command!

Did you mean TAB instead of 8 spaces?

I suspect that at the heart of complaints about make syntax is a single unfortunate confluence of facts. First, make uses a literal TAB character to mark the beginning of a command in a recipe. Second, most code editors automatically replace TAB with spaces. Together these facts conspire to confound even the most experienced makefile writer, resulting in this slightly condescending, always irritating error message:

*** missing separator (did you mean TAB instead of 8 spaces?)

.

I won’t argue with you, this is a real nuisance. But there’s good news: GNU make 3.82 introduced a new special variable called .RECIPEPREFIX. Set this variable to any character you like, and GNU make will use that instead of TAB to mark commands in the makefile. For example:

.RECIPEPREFIX=! all: !@echo Who says make syntax is bad?

Conclusion

Don’t get me wrong: as with any tool, there is room for improvement in make. I agree with John’s suggestion to optionally include command-lines and input file checksums in the up-to-date decisions (some of that is available now in ElectricAccelerator). Beyond that I think it would be great to add support for non-pattern rules with multiple outputs — there’s no way to do that now, although there are a variety of hacks to emulate non-pattern rules with multiple outputs. The interesting thing about these ideas is that all of them can be added to make, without requiring the creation of a completely new build tool.

Yes, make syntax is terse, but the lack of extraneous noise makes it easier to see what’s going on in a makefile than in a comparable build file from another tool. Likewise, make syntax is old, but rather than being a weakness, I see that as a testament to it’s fitness. Surely it’s telling that in 30 years, nothing else has come along that is obviously better, or sufficiently better to justify the cost of migration.

Shell commands in GNU make

For new users, the relationship between make and the shell can be confusing. I think people get thrown off by the make-specific syntax in makefiles — all those colons and at signs and percents. But the truth is that most of the content in a makefile is commands that are executed by the shell.

With GNU make, there are two ways to invoke shell commands from a makefile:

Recipes: the go-to-guy of shell commands in make

The recipe of a rule is the workhorse of gmake/shell integration. Structurally, the recipe is the list of commands used to generate the output file — each of the tab-initiated lines following a target: prereq declaration in the makefile. In the following makefile fragment, I highlighted the recipe in red (note that the commands shown here are for illustration only; in a real makefile, you should use variables like $(CC), $@ and $< to ensure the makefile is portable and flexible):

foo.o: foo.c foo.h
echo 'building foo.o' gcc -c -o foo.o foo.c echo 'done with foo.o'

You can think of the commands in a recipe as being invoked using the common idiom sh -c “command”. That means that you can use standard shell constructs like process pipelines and for loops. In turn, that flexibility means that recipes should be your “go-to guy” when it comes to invoking shell commands from a makefile. Want to preprocess your sources with sed before sending it to the compiler? Just tweak your recipe:

foo.o: foo.c foo.h echo 'building foo.o' sed -e 's/foo/bar/g' foo.c | gcc -c -o foo.o -xc - echo 'done with foo.o'

So recipes are the primary way to invoke shell commands in make. Here are some guidelines to remember:

If possible, gmake will invoke commands directly rather using the shell.

Essentially, gmake scans the command-line for shell built-ins (like for and if) and “shell special characters” (like | and &). If none of these are present in the command-line, gmake will
avoid the overhead of the shell invocation by invoking the command directly (literally just using execve to run the command).

Note that if you change the shell gmake uses by setting the SHELL makefile variable, then gmake will always use the shell to invoke commands, since it can’t know what commands and characters are “special” to your custom shell.

gmake expands command-lines before executing them.

Command expansion is why you can use gmake features like variables (eg, $@) and functions (eg, $(foreach)) in the recipe. It is also why you must use double dollar signs if you want to reference shell variables in your recipe:

abc: def let foo=1 ; echo $$foo

gmake executes each line in a recipe separately.

That means that there’s no sharing of state from one command to the next, and it’s why recipes like the following don’t work as expected:

abc: def
let foo=1 echo $$foo

Because this recipe contains two lines, gmake executes it in two pieces:

sh -c "let foo=1" sh -c "echo $foo"

It’s obvious why this recipe doesn’t work, when written out that way: the variable assignment occurs in one shell, but the reference occurs in another. But there’s an easy way around this: line continuations. You’ve probably seen this technique in use:

abc: def
let foo=1 ; \ echo $$foo

Now gmake executes the recipe using a single shell invocation:

sh -c "let foo=1 ; echo $foo"

Nota bene: if you are using gmake 3.82 or later, you can enable the .ONESHELL feature, which causes gmake to invoke the entire recipe using a single shell invocation, even if you haven’t used line continuations.

The $(shell) function

The $(shell) function is the second way to invoke the shell from gmake. It’s intended purpose is to capture the output of a command into a gmake variable. For example, you could save the name of the current user in the variable USERNAME this way:

USERNAME := $(shell whoami)

$(shell) takes a single argument, the command to run. Just like commands in recipes, if there are shell constructs in the command gmake will invoke it using sh -c “command”; otherwise, gmake will invoke the command directly. Likewise, gmake will expand variable and function references in the command before invoking it, so you must use double-dollar-signs to reference shell variables in that context:

TARGETS := $(shell for n in `seq -w 1 10`; do echo $$n; done)

Here are some guidelines to help you use $(shell) correctly and effectively:

If you’re not capturing the result of $(shell) to a variable, you’re probably misusing $(shell).

Here’s a real-world example of how not to use $(shell):

$(shell touch targets.mk) include targets.mk all: $(TARGETS)

The intent here was to ensure that targets.mk exists before gmake tries to include it. One problem with this approach is that it will touch the file every time you invoke the makefile, even on a “no touch” build! The correct way to accomplish this is with a proper rule for targets.mk:

include targets.mk all: $(TARGETS) targets.mk: @touch targets.mk

If targets.mk doesn’t exist, it will be created. Note that this particular example exploits the makefile remaking feature in gmake; in general though, if you’re using $(shell) this way, you can probably transform that usage into a regular rule, and get better performance and a more robust makefile for your trouble.

If you’re using $(shell) in a recipe, you’re probably misusing $(shell).

Another real-world example of $(shell) abuse:

foo.o: foo.c $(shell sed -e 's/foo/bar/' $< | gcc -o $@ -xc -)

Now that you know that recipes are implicitly using the shell, you can see that this use of $(shell) is utterly superfluous. The problem with this is that it moves the work into the command expansion phase, which means it can’t run in parallel with gmake. The fix for this one is to just drop the $(shell) call:

foo.o: foo.c sed -e 's/foo/bar/' $< | gcc -o $@ -xc -

Always use := assignment with $(shell).

I’ve written before about the importance of using := assignment with $(shell). In short: not using := assignment can cause your makefile to invoke the shell far more often than you realize, which can be a performance problem, and leave you with unpredictable build results. Always use := assignment with $(shell).

Conclusion

Hopefully now you see that the relationship between gmake and the shell is not so mysterious after all. Just remember: when in doubt, use a recipe, and don’t use $(shell) unless you’re capturing the result into a variable.

Faster builds through smarter scheduling: longest job first

One idea that comes up now and then is to speed up parallel builds by being smarter about the order used to run the jobs in the build. Obviously we don’t have complete control over the order — we have to respect the dependencies, of course — but at any given point there are probably multiple jobs ready-to-run. All things being equal, the build ought to finish sooner if we start the longest jobs first. But does it really work out that way?

A compelling example

Here’s an example that a user posted on the GNU make mailing list:

all: A B E A: ; # 3-minute job. B: C D ; # 1-minute job. C D: ; # 1-minute job. E: ; # 6-minute job.

Here’s the dependency graph for this simple makefile. The numbers in parenthesis indicate the serial order of the jobs in the build — the order the jobs will execute if the build runs serially:

Dependency graph for a simple build, with serial order marked

The serial order is also the order that make will start the jobs when running in parallel. For example, if we run this makefile with gmake -j 2, we would see the execution proceed as follows:

  • 0 minutes: gmake start jobs A and C.
  • 1 minute: C completes and gmake starts D (two minutes left on job A).
  • 2 minutes: D completes and gmake starts B (one minute left on job A).
  • 3 minutes: A and B complete, and gmake starts E.
  • 9 minutes: E completes, and the build ends.

Visually, it looks like this:

But this ordering is obviously quite inefficient. Job E is not dependent on any other jobs, so there’s no reason we can’t start it sooner. In fact, if we start E first, then the execution looks like this:

By starting the longest jobs first, we can trim the overall build time by an impressive 30%! So: obviously we can fabricate a build that shows significant improvement by use of a longest-job-first scheduler. But will we see similar results from real builds?

Look before you leap

To answer this question, I made a build simulator that simulates running a build using longest-job-first scheduling. The simulator uses job duration and dependency information from an annotation file generated during a real build with ElectricAccelerator, a high-performance gmake replacement.

I tested the longest-job-first scheduler on several builds:

  • MySQL
  • Samba
  • Mozilla

To my disappointment, the new scheduler showed no significant benefit on these builds:


Build times in real builds are virtually unchanged with the longest-job-first scheduler. On some of these graphs you can barely even tell there are two distinct lines!

What went wrong?

I think there are two factors that explain this lackluster result. The first is homogeneity: in most builds, the majority of the jobs are more-or-less the same length. For example, 90% of the jobs in the Mozilla build are less than 0.25s long. 80% of the jobs in the Samba build are in the 2.5s to 5.0s range. What difference does it make to choose job A or job B when they both have nearly identical durations? Now, maybe you’re thinking “A-ha! What about the link job? That’s definitely longer than the other jobs!” And it’s true, the link job often is longer. But by its nature it can’t start any sooner than after all the other jobs have finished anyway — so again the longest-job-first scheduler has no choice to make, because there is only one job to choose.

The second factor is the longest-job-first scheduler is smarter, but not smart enough: by considering only the length of each job in isolation, the longest-job-first scheduler cannot account for situations where a short job blocks a very long job. For example, suppose that we change our original example by adding a prereq for job E:

all: A B E A: ; # 3-minute job. B: C D ; # 1-minute job. C D: ; # 1-minute job. E: F ; # 6-minute job. F: ; # 10-second job.

Because job F is so short, the longest-job-first scheduler will never prioritize it over jobs A, B, C, or D — in fact, the scheduler won’t run job F until it is literally the only choice left. Unfortunately, that means that job E won’t run until the end of the build, clobbering our overall build time.

Where do we go from here?

From these simulations, it’s clear that there’s little point in pursuing a simple longest-job-first scheduler. But I think there may be something to the idea of a scheduler that considers not just individual job lengths but the relationships between jobs. I’ll explore that possibility in a future post.

How long are the jobs in my build? part 2

In response to my post about visualizing the lengths of the jobs in a build, one reader suggested a few tweaks to my gnuplot script to make the graph a proper surface plot. I like the look of this:

This version addresses some of the short-comings of my original:

  • It’s easier to determine the z-coordinate of a given point. In the original that was nearly impossible. It’s still a little tricky here because of the perspective, but it’s a step in the right direction.
  • Lower layers are not obscured. Originally, a dense layer of points could obscure points with a lower z-value. This version avoids that problem because you can see places where the surface dips.

Unfortunately, this version introduces some new problems:

  • Raw data points are averaged. In order to produce this surface plot, gnuplot computes a weighted average of the data points. Averaging itself is not necessarily a problem. The trouble here is that the layout of the data points is completely arbitrary, as you may recall from the previous post. That means that this plot effectively picks a handful of random data points, averages them, and plots the result. We still see the general trend — that most of the jobs are about the same length — but it feels a bit phony.
  • Implies patterns where there are none. When I first saw this image, I was struck by the “mountain range” running across the plot, a bit left of center. I hadn’t seen that in my original graph, so naturally I was intrigued. I spent hours trying to understand why that feature might be present, and finally came to this conclusion: it isn’t real. It’s just an artifact of the graphing method. Remember, the layout of the points is completely arbitrary, so it would be quite odd for there to really be a pattern like this cutting across the plot. In fact, I found that similar “features” appeared no matter what dimensions I used for the plot. I think the reason is that in this mode, gnuplot is not plotting the raw data, but rather a weighted average of adjacent points. This will tend to introduce relationships between those points that are not actually real.

OK, so this revised version is definitely interesting. I’m not sure that it’s better necessarily, given the defects I mentioned above. And unfortunately it doesn’t help at all with the issue of making something useful out of the X/Y coordinates. Nevertheless, thanks Aaron for the suggestion!

How long are the jobs in my build?

I’ve been playing with a new visualization for build data. I was looking for a way to really hammer home the point that in most builds, the vast majority of jobs are more-or-less the same length. The “Job Count by Length” report in ElectricInsight does the same thing, but in a “just the facts” manner. I wanted something that would be more visceral.

Then I struck on the idea of mapping the jobs onto a surface plot, using the job duration as the z-coordinate or “height”, so longer jobs have points high above the z-axis. In such a view, we would expect to see a mostly flat plain, with a small portion of points above the plain. Sure enough, that’s just what we get. Here’s an example, generated using data from a mozilla build:

Here’s what I like about this visualization:

  • Nails the primary goal. This visualization is great at demonstrating that most jobs in the build have about the same duration.
  • It’s looks cool. Given a choice between two visualizations that show the same data, the one that looks cooler definitely has an advantage.

Now, here’s what I don’t like about this visualization:

  • X- and Y-coordinates are arbitrary. For this prototype I just determined the smallest box large enough to show all the jobs in the build, then plotted the first job at 0,0; the second at 0,1, etc. This is simple, and it gives a compact display, but it would be nice if the X- and Y-coordinates had some actual meaning.
  • It’s hard to tell what Z-coordinate any given point has. For example, I can easily see that the vast majority of jobs have roughly the same duration, but what duration is that? 0 seconds? 1 second? 1/2 second?
  • A dense upper layer obscures lower layers. Although this build is unimodal, suppose it was instead bimodal — the density of points at height 5 might obscure the existence of points at height 3.

For comparison, here’s the “Job count by Length” report from ElectricInsight. It uses the same data, and tells the same story, but it’s not nearly as visually dramatic:

So, what do you think? Any ideas how I could use the X- and Y-coordinates to convey useful information? Keep reading if you want to see how I made this visualization.
(more…)