#pragma multi and rules with multiple outputs in GNU make

Recently we released ElectricAccelerator 6.2, which introduced a new bit of makefile syntax — #pragma multi — which allows you to indicate that a single rule produces multiple outputs. Although this is a relatively minor enhancement, I’m really excited about it because this it represents a new direction for emake development: instead of waiting for the GNU make project to add syntactic features and then following some time later with our emulation, we’re adding features that GNU make doesn’t have — and hopefully they will have to follow us for a change!

Unfortunately I haven’t done a good job articulating the value of #pragma multi. Unless you’re a pretty hardcore makefile developer, you probably look at this and think, “So what?” So let’s take a look at the problem that #pragma multi solves, and why #pragma multi matters.

Rules with multiple outputs in GNU make

The problem we set out to solve is simply stated: how can you specify to GNU make that one rule produces two or more output files? The obvious — but wrong — answer is the following:

1
2
foo bar: baz
touch foo bar

Unfortunately, this fragment is interpreted by GNU make as declaring two rules, one for foo and one for bar — it just so happens that the command for each rule creates both files. That will do more-or-less the right thing if you run a from-scratch, serial build:

$ gmake foo bar
touch foo bar
gmake: `bar' is up to date.

By the time GNU make goes to update bar, it’s already up-to-date thanks to the execution of the rule for foo. But look what happens when you run this same build in parallel:

$ gmake -j 2 foo bar
touch foo bar
touch foo bar

Oops! — the files were updated twice. No big deal in this trivial example, but it’s not hard to imagine a build where running the commands to update a file twice would produce bogus output, particularly if those updates could be happening simultaneously.

So what’s a makefile developer to do? In standard GNU make syntax, there’s only one truly correct way to create a rule with multiple outputs: pattern rules:

1
2
%.x %.y: %.in
touch $*.x $*.y

In contrast with explicit rules, GNU make interprets this fragment as declaring a single rule that produces two output files. Sounds perfect, but there’s a significant limitation to this solution: all of the output files must share a common sequence in the filenames (called the stem in GNU make parlance). That is, if your rule produces foo.x and foo.y, then pattern rules will work for you because the outputs both have foo in their names.

If your output files do not adhere to that naming limitation, then pattern rules can’t help you. In that case, you’re pretty much out of luck: there is no way to correctly indicate to GNU make that a single rule produces multiple output files. There are a variety of hacks you can try to coerce GNU make to behave properly, but each has its own limitations. The most common is to nominate one of the targets as the “primary”, and declare that the others depend on that target:

1
2
3
bar: foo
foo: baz
touch foo bar

Watch what happens when you run this build serially from scratch:

$ gmake foo bar
touch foo bar
gmake: Nothing to be done for `bar'.

Not bad, other than the odd “nothing to be done” message. At least the files weren’t generated twice. How about running it in parallel, from scratch?

$ gmake -j 2 foo bar
touch foo bar
gmake: Nothing to be done for `bar'.

Awesome! We still have the odd “nothing to be done” message, but just as in the serial build, the command was only invoked one time. Problem solved? Nope. What happens in an incremental build? If you’re lucky, GNU make happens to do the right thing and regenerate the files. But in one incremental build scenario, GNU make utterly fails to do the right thing. Check out what happens if the secondary output is deleted, but the primary is not:

$ rm -f bar && gmake foo bar
gmake: `foo' is up to date.
gmake: Nothing to be done for `bar'.

That’s right: GNU make failed to regenerate bar. If you’re very familiar with the build system, you might realize what had happened and think to either delete foo as well, or touch baz so that foo appears out-of-date (which would cause the next run to regenerate both outputs). But more likely at this point you just throw your hands up and do a full clean rebuild.

Note that all of the alternatives in vanilla GNU make have similar deficiencies. This kind of nonsense is why incremental builds have a bad reputation. This is why we created #pragma multi.

Rules with multiple outputs in Electric Make

By default Electric Make emulates GNU make, so it inherits all of GNU make’s limitations regarding rules with multiple outputs — with one critical exception. Even when running a build in parallel, Electric Make ensures that the output matches that produced by a serial GNU make build, which means that even the original, naive attempt will “work” for full builds regardless of whether the build is serial (single agent) or parallel (multiple agents).

Given that foundation, why did we bother with #pragma multi? There are a couple reasons:

  1. Correct incremental builds: with #pragma multi you can correctly articulate the relationships between inputs and outputs and thus ensure that all the outputs get rebuilt in incremental builds, rather than using kludges and hoping for the best.
  2. Out-of-the-box performance: although Electric Make guarantees correct output of the build, if you don’t have an up-to-date history file for the build you may waste time and compute resources running commands that don’t need to be run (work that will eventually be discarded when Electric Make detects the error). In the examples shown here the cost is negligible, but in real builds it could be significant.

Using #pragma multi is easy: just add the directive before the rule that will generate multiple outputs:

1
2
3
#pragma multi
foo bar: baz
touch foo bar

Watch what happens when this makefile is executed with Electric Make:

$ emake foo bar
touch foo bar

Note that there is no odd “is up to date” or “nothing to be done” message for bar — because Electric Make understands that both outputs are created by a single rule. Let’s verify that the build works as desired in the tricky incremental case that foiled GNU make — deleting bar without deleting foo:

$ rm -f bar && emake foo bar
touch foo bar

As expected, both outputs are regenerated: even though foo existed, bar did not, so the commands were executed.

Summary: rules with multiple outputs

Let’s do a quick review of the strategies for creating rules with multiple outputs. For simplicity we can group them into three categories:

  • #pragma multi
  • The naive approach, which does not actually create a single rule with multiple outputs at all.
  • Any of the various hacks for approximating rules with multiple outputs.

Here’s how each strategy fares across a variety of build modes:

Electric Make GNU make
Full (serial) Full (parallel) Incremental Full (serial) Full (parallel) Incremental
#pragma multi N/A
Naive
Hacks


The table paints a grim picture for GNU make: there is no way to implement rules with multiple outputs using standard GNU make which reliably gives both correct results and good performance across all types of builds. The naive approach generates the output files correctly in serial builds, but may fail in parallel builds. The various hacks work for full builds, but may fail in incremental builds. Even in cases where the output files are generated correctly, the build is marred by spurious “is up to date” or “nothing to be done for” messages — which is why most of the entries in the GNU make side are yellow rather than green.

In contrast, #pragma multi allows you to correctly generate multiple outputs from a single rule, for both full and incremental builds, in serial and in parallel. The naive approach also “works” with Electric Make, in that it will produce correct output files, but like GNU make the build is cluttered with spurious warnings. And, unless you have a good history file, the naive approach can trigger conflicts which may negatively impact build performance. Finally, despite its sophisticated conflict detection and correction smarts, even Electric Make cannot ensure correct incremental builds when you’ve implemented one of the multiple output hacks.

So there you have it. This is why we created #pragma multi: without it, there’s just no way to get the job done quickly and reliably. You should give ElectricAccelerator a try.

try_eade_button2

What’s new in ElectricAccelerator 6.2?

We released ElectricAccelerator 6.2 a couples weeks ago, our 25th feature release. 6.2 was a quick interim release primarily intended to address a couple long-standing stability issues, but we managed to squeeze in some really interesting feature enhancements as well. Here’s what’s new:

Rules with multiple outputs? Yeah, we can do that.

Every now and then, makefile authors need to write a single makefile rule that produces more than one output file, to accomodate tools that don’t fit gmake’s rigid one-command-one-output model. The classic example is bison, which produces both a C file and a header file from a single invocation of the tool.

Unfortunately in regular gmake the only way to write a rule with multiple outputs is to use a pattern rule. That’s great — if your needs happens to dovetail with the caveats and limitations of pattern rules (chiefly, that the output files share a common base name). If not, the answer has been basically that you’re out of luck. There are a variety of kludges that approximate the behavior, but despite numerous requests over the last decade (1, 2, 3, 4, 5, 6, 7, 8) and at least one patch implementing the feature, GNU make (as of 3.82) still has no way to create an explicit rule that produces multiple outputs.

When it comes to enhancements to the fundamental operation of GNU make, we’ve historically let the GNU make team take the lead, rather than risk introducing potentially incompatible changes. But after so many years it seems clear that this feature is not going to show up in GNU make — so we decided to forge ahead on our own. Enter #pragma multi:

1
2
3
#pragma multi
foo bar:
@touch foo bar

GNU make interprets this construct as two independent rules, one for foo and one for bar, which happen to each create both files. Thanks to the #pragma multi designation, Electric Make will interpret this as a single rule which produces both foo and bar. Using a #pragma to flag the rule is perfect, because it sidesteps any questions about syntax changes. And since #pragma starts with a #, GNU make will treat it as a comment, so this makefile will still be usable with GNU make — you’ll just get correct behavior and better performance with Electric Make.

New platforms and a faster installer

Accelerator 6.2 adds support for Linux kernels up to 3.5.x, which means that Accelerator now supports the following platforms:

  • Ubuntu 11.10
  • Ubuntu 12.04
  • SUSE Linux Enterprise Server 11 SP2

In addition, Accelerator 6.2 is expected to work correctly on both Ubuntu 12.10 and Windows 8, although we cannot officially claim support for those platforms since they were themselves not finalized at the time Accelerator 6.2 was released. This release also incorporates enhancements to the Linux installer which make the installation process about 25% faster compared to previous releases.

A complete list of platforms supported by ElectricAccelerator 6.2 can be found in the Electric Cloud Knowledge Base.

Key robustness improvements

Raise your hand if you’ve ever seen this error on your Linux Accelerator agent hosts:

unable to unmount EFS at “/some/path”: EBUSY

That error shows up sometimes when your build starts background processes — kind of a distributed build anti-pattern itself, but unfortunately it’s not always something you can control thanks to some third-party toolchains. Or rather, that error used to show up sometimes, because in Accelerator 6.2 we’ve bulletproofed the system against such rogue background processes, so that error is a thing of the past (nota bene: this enhancement is not available on Solaris).

In addition, we bulletproofed the system against external processes (any process running on an agent host which is not part of your build) accessing the EFS. In certain rare circumstances, such accesses could lead to agent host instability.

What’s next?

With 6.2 out the door we’ve finally got bandwidth to work on 7.0, which will focus on some very exciting performance improvements, especially for incremental builds. It’s a little bit too early to share any of the preliminary results we’re seeing, but rest assured — if you thought Accelerator was fast before, well… you ain’t seen nothing yet! Stay tuned for more information.

ElectricAccelerator 6.2 is available immediately. If you are already an Accelerator user, contact support@electric-cloud.com to upgrade. If you are not currently a user, you can download a free evaluation version of ElectricAccelerator Developer Edition, or contact sales@electric-cloud.com.

Fixing recursive make

Recursive make is one of those things that everybody loves to hate. It’s even been the subject of one of those tired “… Considered Harmful” diatribes. According to popular opinion, recursive make will sap performance from your build, make it nigh impossible to ensure correctness in parallel builds, and may render the user sterile. OK, maybe not that last one. But seriously, the arguments against recursive make are legion, and deeply entrenched. The problem? They’re flawed. That’s because they assume there’s only one way to implement recursive make — when the submake is invoked, the parent make is blocked until the submake completes. That’s how almost everybody does it. But in Electric Make, part of ElectricAccelerator, we developed a novel new approach called non-blocking recursive make. This design eliminates the biggest problems attributed to recursive make, without requiring a painful and costly conversion of your build system to non-recursive make.

The problem with traditional recursive make

There’s really just two problems at the heart of complaints with traditional recursive make: first, there’s no way to ensure correctness of a parallel recursive make based build without overserializing the submakes, because there’s no way to articulate dependencies between individual targets in different submakes. That means you can’t have a dependency graph that is both correct and precise. Instead you either leave out the critical dependency entirely, which makes parallel (ie, fast) builds unreliable; or you serialize submakes in their entirety, which shackles build performance because no part of a submake with even a single dependency on some portion of an earlier submake can begin until the entire ealier submake completes. Second, even if there were a way to specify precise dependencies between targets in different submakes, most versions of make have implemented recursive make such that the parent make is blocked from proceeding until the submake has completed. Consider a typical use of recursive make with implicit serializations between submakes:

1
2
3
4
all:
@for dir in util client server ; do \
$(MAKE) -C $$dir; \
done

Each submake compiles a bunch of source files, then links them together into a library (util) or an executable (client and server). The only actual dependency between the work in the three make instances is that the client and server programs need the util library. Everything else is parallelizable, but with traditional recursive make, gmake is unable to exploit that parallelism: all of the work in the util submake must finish before any part of the client submake begins!

Conflict detection and non-blocking recursive make

If you’re familiar with Electric Make, you already know how it solves the first half of the recursive make problem: conflict detection and correction. I’ve written about conflict detection before, but here’s a quick recap: using the explicit dependencies given in the makefiles and information about the files accessed as each target is built, emake is able to dynamically determine when targets have been built too early due to missing explicit dependencies, and rerun those targets to generate the correct output. Electric Make can ensure the correctness of parallel builds even in the face of incomplete dependencies, even if the missing dependencies are between targets in different submakes. That means you need not serialize entire submakes to ensure the build will run correctly in parallel.

Like an acrobat’s safety net, conflict detection allows us to consider solutions to the other half of the problem that would otherwise be considered risky, if not outright madness. In fact, our solution would not be possible without conflict detection: non-blocking recursive make. This is analogous to the difference between blocking and non-blocking I/O: rather than waiting for a recursive make to finish, emake carries on executing subsequent commands in the build immediately, including other recursive makes. Conflict detection ensures that only the commands in each submake which require serialization are executed sequentially, so the build runs as quickly as possible, but the final build output is identical to a serial build.

The impact of this change is dramatic. Here I’ve plotted the execution of the simple build defined above on four cores, using both gmake (normal recursive make) and emake (non-blocking recursive make):

Recursive make build with gmake


Recursive make build with emake

Electric Make is able to execute this build about 20% faster than gmake, with no changes to the Makefiles or the execution environment. emake is literally able to squeeze more parallelism out of recursive-make-based builds than gmake. In fact, we can precisely quantify just how much more parallelism emake gets through an application of Amdahl’s law. First, we compute the best possible speedup for the build — that’s just the serial runtime divided by the best possible parallel runtime, which we can figure out through analysis of the depedency graph and runtime of individual jobs in the build (the Longest Serial Chain report in ElectricInsight can do this for you). Then we can compute the parallelizable portion P of the build by plugging the speedup S into this equation: P = 1 – (1 / S). Here’s how that works out for gmake and emake:

gmake emake
Serial baseline 65s 65s
Best build time 13.5s 7.5s
Best speedup 4.8x 8.7x
Parallel portion 79% 89%

On this build, non-blocking recursive make increases the parallel portion of the build by 10%. That may not seem like much, but Amdahl’s law shows how dramatically that difference affects the speedup you can expect as you apply more cores:

Implementation

On the backend, non-blocking recursive make is handled by conflict detection — the jobs from the recursive make are checked for conflicts in the serial order defined by the makefile structure. Any issues caused by aggressively running recursive makes early are detected during the conflict check, and the target that ran too early is rerun to generate the correct result.

On the frontend, emake uses a strategy that is at once both brilliant in its simplicity, and diabolical in its trickery. It starts with an environment variable. When emake is invoked recursively, it checks the value of EMAKE_BUILD_MODE. If it is set to node, emake runs in so-called stub mode: rather than executing the submake (parsing the makefile and building targets), emake captures the invocation context (working directory, command-line and environment) in a file on disk, prints a “magic” string and exits with a zero status code.

The file containing the invocation context is identified by a second environment variable, ECLOUD_RECURSIVE_COMMAND_FILE. The Accelerator agent (which handles invoking commands on behalf of emake) checks for the presence of that file after every command that is run. If it is found, the agent relays the content to the toplevel emake invocation, where a new make instance is created to represent the submake invocation. That instance comes with it’s own parse job of course, which gets inserted into the queue of jobs. Some (short) time later, the parse job will run, discover whatever work must be run by the submake, and create additional rule jobs.

The magic string — EMAKE_FNORD — serves as a placeholder in the stdout stream for the jobs, so emake can figure out which portion of the output text comes before and which portion comes after the submake. This ensures that the build output log is identical to that generated by a serialized gmake build. For example, given the following rule that invokes a submake, you’d expect to see the “Before” and “After” messages printed before and after the output generated by commands in the submake itself:

1
2
3
4
all:
@echo Before util ; \
@$(MAKE) -C util ; \
@echo After util

With non-blocking recursive make, the submake has not actually executed when the “echo After util” command runs. If emake doesn’t account for that reordering, both the “Before” and “After” messages will appear before any of the output from the submake. EMAKE_FNORD allows emake to “stitch” the output together so the build log matches a serial log.

Limitations

Conflict detection and non-blocking recursive make together solve the main problems associated with recursive make. But there are a couple scenarios where non-blocking recursive make does not work well. Fortunately, these are uncommon in practice and easily addressed.

Capturing recursive make stdout

The first scenario is when the build captures the output of the recursive make invocation, rather than letting it print to stdout as normal. Since emake defers the execution of the submake and prints only EMAKE_FNORD to stdout, this will not work. There are two reasons you might do this: first, you might want to have separate build logs for each submake, to simplify error detection and management. In this situation, the simplest workaround is to remove the redirection and instead us emake’s annotated build log, an XML version of the build output log which can be easily processed using standard tools. Second, you may be using make as a text-processing tool (sort of a “poor man’s” Perl), rather than for building per se:

1
2
3
all:
@$(MAKE) -f genlist.mk > objects.txt
@cat objects.txt | xargs rm

In this case, the workaround is to explicitly force emake to run in so-called “local” mode, which means emake will handle the recursive make invocation as a blocking invocation, just like traditional make would. You can force emake into local mode by adding EMAKE_BUILD_MODE=local to the environment before the recursive make invocation.

Immediate consumption of build products

The second scenario is when the build consumes the product of the submake in the same command that contains the invocation. For example:

1
2
all:
@$(MAKE) -C sub foo && cp sub/foo ./foo

Here the build assumes that the output files generated by the submake will be available for use immediately after the submake completes. Obviously this is not the case with non-blocking recursive make — when the invocation of $(MAKE) -C sub foo completes, only the submake stub has actually finished. The build products will not be available until after the submake is actually processed later. Note that in this build both the recursive make invocation and the commands that use the build products from that invocation are treated as a single command from the perspective of make: make actually invokes the shell, and the shell then runs the recursive make and cp commands.

The workaround is simple: split the consumer into a distinct command, from the perspective of make:

1
2
3
all:
@$(MAKE) -C sub foo
@cp sub/foo ./foo

With that trivial change, emake is able to treat the cp as a continuation job, which can be serialized against the completion of the recursive make as needed.

A fix for recursive make

For years, people have heaped scorn and criticism on recursive make. They’ve nearly convinced everybody that even considering its use is automatically wrong — you probably can’t help feeling a little bit guilty when you use recursive make. But the reality is that recursive make is a reasonable way to structure a large build. You just need a better make. With conflict detection and non-blocking recursive make, Electric Make has fixed the problems usually associated with recursive make, so you can get parallel builds that are both fast and correct. Give it a try!

Another confusing conflict in ElectricAccelerator

After solving the case of the confounding conflict, my user came back with another scenario where ElectricAccelerator produced an unexpected (to him) conflict:

1
2
3
4
5
6
all:
@$(MAKE) foo
@cp foo bar
foo:
@sleep 2 && echo hello world > foo

If you run this build without a history file, using at least two agents, you will see a conflict on the continuation job that executes the cp foo bar command, because that job is allowed to run before the job that creates foo in the recursive make invocation. After one run of course, emake records the dependency in history, so later builds don’t make the same mistake.

This situation is a bit different from the symlink conflict I showed you previously. In that case, it was not obvious what caused the usage that triggered the conflict (the GNU make stat cache). In this case, it’s readily apparent: the continuation job reads (or attempts to read) foo before foo has been created. That’s pretty much a text-book example of the sort of thing that causes conflicts.

What’s surprising in this example is that the continuation job is not automatically serialized with the recursive make that precedes it. In a very real sense, a continuation job is an artificial construct that we created for bookkeeping reasons internal to the implementation of emake. Logically we know that the commands in the continuation job should follow the commands in the recursive make. In fact it would be absolutely trivial for emake to just go ahead and stick in a dependency to ensure that the continuation is not allowed to start until after the recursive make finishes, thereby avoiding this conflict even when you have no history file.

Given a choice between two strategies that both produce correct output, emake uses the strategy that produces the best performance in the general case.

Absolutely trivial to do, yes — but also absolutely wrong. Not for correctness reasons, this time, but for performance. Remember, emake is all about maximizing performance across a broad range of builds. Given a choice between two strategies that both produce correct output, emake uses the strategy that produces the best performance in the general case. For continuation jobs, that means not automatically serializing the continuation against the preceding recursive make. I could give you a wordy, theoretical explanation, but it’s easier to just show you. Suppose that your makefile looked like this instead of the original — the difference here is that the continuation job itself launches another recursive make, rather than just doing a simple cp:

1
2
3
4
5
6
7
8
9
all:
@$(MAKE) foo
@$(MAKE) bar
foo:
@sleep 2 && echo hello world > foo
bar:
@sleep 2 && echo goodbye > bar

Hopefully you agree that the ideal execution of this build would have both foo and bar running in parallel. Forcing the continuation job to be serialized with the preceding recursive make would choke the performance of this build. And just in case you’re thinking that emake could be really clever by looking at the commands to be executed in the continuation job, and only serializing “when it needs to”: it can’t. First, that would require emake to implement an entire shell syntax parser (or several, really, since you can override SHELL in your makefile). Second, even if emake had that ability, it would be thwarted the instant the command is something like my_custom_script.pl — there’s no way to tell what will happen when that gets invoked. It could be a simple filesystem access. It could be a recursive make. It could be a whole series of recursive makes. Even when the command is something you think you recognize, can emake really be sure? Maybe cp is not our trustworthy standard Unix cp, but something else entirely.

Again, all is not lost for this user. If you want to avoid this conflict, you have a couple options:

  1. Use a good history file from a previous build. This is the simplest solution. You’ll only get conflicts in this build if you run without a history file.
  2. Refactor the makefile. You can explicitly describe the dependency between the commands in the continuation job and the recursive make by refactoring the makefile so that the stuff in the continuation is instead its own target, thus taking the decision out of emake’s hands. Here’s one way to do that:
    1
    2
    3
    4
    5
    6
    7
    8
    all: do_foo
    @cp foo bar
    do_foo:
    @$(MAKE) foo
    foo:
    @sleep 2 && echo hello world > foo

Either of these will eliminate the conflict from your build.

ElectricAccelerator and the Case of the Confounding Conflict

A user recently asked me why ElectricAccelerator reports a conflict in this simple build, when executed without a history file from a previous run:

1
2
3
4
5
6
7
all: foo symlink_to_foo
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo

Specifically, if you have at least two agents, emake will report a conflict between symlink_to_foo and foo, indicating that symlink_to_foo somehow read or otherwise accessed foo during execution! But ln does not access the target of a symlink when creating the symlink — in fact, you can even create a symlink to a non-existent file if you like. It seems obvious that there should be no conflict. What’s going on?

To understand why this conflict occurs, you have to wrap your head around two things. First, there’s more going on during a gmake-driven build than just the commands you see gmake invoke. That causes the usage that provokes the conflict. Second, emake considers a serial gmake build the “gold standard” — if a serial gmake build produces a particular result, so too must emake. That’s why the additional usage must result in a conflict.

In this case, the usage that triggers the conflict comes from management of the gmake stat cache. This is a gmake feature that was added to improve performance by avoiding redundant calls to stat() — once you’ve stat()‘d a file once, you don’t need to do it again. Unless the file is changed of course, which happens quite a lot during a build. To keep the stat cache up-to-date as the build progresses, gmake re-stat()‘s each target after it finishes running the commands for the target. So after the commands for symlink_to_foo complete, gmake stat()‘s symlink_to_foo again, using the standard stat() system call, which follows the symlink (in contrast to lstat(), which does not follow the symlink). That means gmake will actually cache the attributes of foo for symlink_to_foo.

To ensure compatibility with gmake, emake has to do the same. In Accelerator parlance, that means we get read usage on symlink_to_foo (because you have to read the symlink itself to determine the target of the symlink), and lookup usage on foo. The lookup on foo causes the conflict, because, of course, you will get a different result if you lookup foo before the job that creates it than you would get if you do the lookup after that job. Before the job, you’ll find that foo does not exist, obviously; after, you’ll find that it does.

But what difference does that make, really? In truth, if there’s no detectable difference in behavior, then it doesn’t matter at all. And in the example build there is no detectable difference — the build output is the same regardless of when exactly you stat() symlink_to_foo relative to when foo is created. But with a small modification to the build, it is suddenly becomes possible to see the impact:

1
2
3
4
5
6
7
8
9
10
all: foo symlink_to_foo reader
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo
reader: foo symlink_to_foo
@echo newer prereqs are: $?

Compare the output when this build is run serially with the output when the build is run in parallel — and note that I’m using gmake, so you can be certain I’m not trying to trick you with some peculiarity of emake’s implementation:

You can plainly see the difference: in the parallel build gmake stat()‘s symlink_to_foo before foo exists, so the stat cache records symlink_to_foo as non-existent. Then when gmake generates the value of $? for reader, symlink_to_foo is excluded, because non-existent files are never considered newer than existing files. In the serial build, gmake stat()‘s symlink_to_foo after foo has been created, so the stat cache indicates that symlink_to_foo exists and is newer than reader, so it is included in $?.

Hopefully you see now both what causes the conflict, and why it is necessary. The conflict occurs because of lookup usage generated when updating the stat cache. The conflict is necessary to ensure that the build output matches that produced by a serial gmake — the “gold standard” for build correctness. If no conflict is declared, there is the possibility for a detectable difference in build output compared to serial gmake.

However, you might be thinking that although it makes sense to treat this as a conflict in the general case, isn’t it possible to do something smarter in this specific case? After all, the orignal example build does not use $?, and without that there isn’t any detectable difference in the build output. So why not skip the conflict?

The answer is simple, if a bit disappointing. In theory it may be possible to elide the conflict by checking to see if the symlink is used by a later job in a manner that would produce a detectable difference (for example, by scanning the commands for subsequent targets for references to $?), but in reality the logistics of that check are daunting, and I’m not confident that we could guarantee correct behavior in all cases.

Fortunately all is not lost. If you wish to avoid this conflict, you have several options:

  1. Use a good history file from a previous build. This is the most obvious solution. You’ll only get conflicts if you run without a history file.
  2. Add an explicit dependency. If you make foo an explicit prereq of symlink_to_foo, then you will avoid the conflict. Here’s how that would look:
    1
    symlink_to_foo: foo
  3. Change the serial order. If you reorder the makefile so that symlink_to_foo has an earlier serial order than foo you will avoid the conflict. That just requires a reordering of the prereqs of all:
    1
    all: symlink_to_foo foo

Any one of these will eliminate the conflict from your build, and you’ll enjoy fast and correct parallel builds.

Case closed.

Makefile hacks: automatically split long command lines

If you’ve worked on a large build system you’ve probably bumped into this error, or one like this:

gmake: execvp: /bin/sh: Argument list too long

This error means the length of some command-line in your makefile has grown past the system limit, which is typically in the 32 to 256 kilobyte range. It’s surprisingly easy to hit that limit. You start with a small list of object files to be linked together. Over time you add more, and the command-line gets a little longer. Add a few more and it gets longer still. Before you know it you have a monster command-line and your build starts failing.

The solution to this problem is simple: split the long command-line into several shorter command-lines. For example, ar r libraries/lib.a objects/foo.o objects/bar.o objects/baz.o objects/boo.o objects/bang.o becomes something like this:

ar r libraries/lib.a objects/foo.o objects/bar.o
ar r libraries/lib.a objects/baz.o objects/boo.o
ar r libraries/lib.a objects/bang.o

Simple in theory, but tedious to do by hand. And doing it manually is like putting a ticking time-bomb into your makefile — it’s only a matter of time before your build grows enough that you have to go through this exercise again.

I recently ran across a clever solution that exploits the $(eval) function in GNU make to split long command-lines automatically, eliminating the tedium and the time-bomb. After I show you the solution, I’ll explain it piece-by-piece.

The max_args function

The solution is a user-defined function called max_args that splits long command-lines into equal-length chunks:

1
2
3
4
5
6
7
8
9
define max_args
$(eval _args:=)
$(foreach obj,$3,$(eval _args+=$(obj))$(if $(word $2,$(_args)),$1$(_args)$(EOL)$(eval _args:=)))
$(if $(_args),$1$(_args))
endef
define EOL
endef

And an example of its use:

1
2
3
OBJS:=a b c d e f g h
all:
@$(call max_args,echo,2,$(OBJS))

The max_args function takes three parameters: the base command-line, the number of arguments per “chunk”, and the complete list of arguments. It expands to a series of command-lines — one for each chunk of arguments.

The trick behind max_args is the use of $(eval) to update a variable as a side-effect of gmake’s regular variable expansion activity. If you’re not familiar with gmake variable expansion, here’s a quick rundown: when gmake finds a variable or function reference, like $(something), it replace the entire reference with an expanded value. In the case of a variable that’s just the value of the variable. Most variables in gmake are recursive which means that if the variable value itself contains embedded variable references, those will be expanded as well, recursively. In the case of a function, gmake evaluates the function, and replaces the reference with the computed value.

The meat of max_args is on line 3. It starts with the $(foreach) function, which evaluates its third argument, the body of the loop, once for each word in its second argument — in this case, the list of objects passed in the call to max_args.

In max_args, the loop body has two components. The first is a call to $(eval), which simply appends the current value of the loop variable to an accumulator called _args.

The second component of the loop body uses $(if) and $(word) to check the length of _args. The $(word) function returns the nth word from a list, or an empty string if there are fewer than n words in the list. The $(if) function expands its second argument (the then clause) only if its first argument (the condition) expands to a non-empty string, so together these functions check if _args has the desired number of words, and if so the then clause of the $(if) is expanded.

The then clause of this $(if) has two components. The first constructs a completed command-line by concatenating the base command-line, here given by $1, the first argument to the original max_args call; the accumulated arguments; and a newline character. Thanks to the rules of gmake expansion, this command-line is added to the overall expansion result for the max_args function. The second part of the then clause uses $(eval) to reset the accumulator

If the chunk size does not evenly divide the number of arguments, the stragglers are emitted in a final command-line on the last line of max_args.

Limitations

max_args is handy but it has one significant limitation: command-line length limits are based on the number of bytes in the command-line, not the number of words, in it. Unfortunately, gmake has no built-in way to count the number of characters in a string. gmake does provide the $(words) built-in, so that’s what max_args uses. That just means that to use it effectively you have to take a guess at the number of arguments that will fit in a single command-line, for example by dividing the length limit by the average number of characters in each argument, then subtracting some to allow some buffer for outliers.

ElectricAccelerator Job Compendium

The fundamental unit of work in ElectricAccelerator is the job. Most of the time, you can think of a job as all the commands that must be run in order to create or update a single build output, but in truth that describes only one type of job. There are actually several different job types, each with a distinct purpose in the structure of a build and the way Accelerator executes the build. You can determine the type of a job from the type attribute on the <job> tag in Accelerator annotation files.

Having some familiarity with the job types and their use will make it easier for you to understand Accelerator performance and behavior, so I wrote this guide to introduce them. First I’ll describe the jobs used by ElectricMake (emake), and then the jobs used by Electrify.

ElectricMake jobs

In order to make the descriptions more concrete, I created a simple reference build that uses each of the most common job types, so you can see exactly how the jobs relate to a real build. Here’s the reference build makefile:

prog: 
        @$(MAKE) sub/prog
        @cp sub/prog ./prog

sub/prog: sub/main.c
        @cat $< > $@

setup:
        rm -rf prog sub
        mkdir sub
        echo "int main() { return 0; }" > sub/main.c
        touch prog2

To run the build, first run emake setup, which will create the files and directory structure needed by the build, then run emake –emake-maxagents=1 –emake-annodetail=basic –emake-annofile=emake.xml prog prog2. That will produce a build with thirteen different jobs, in two different make instances. Here’s an overview of how those jobs fit together (click for a larger image):


parse

The first job in any make invocation (and therefore the first job of any emake build) is a parse job, during which emake reads and interprets the makefiles used in that make instance. The output of a parse job is a list of all the jobs in the make instance, along with a list of targets that must be built, the commands to build those targets, and the relationships between them.


exist

Existence jobs (marked as type “exist” in annotation) are used to check for the existence of makefiles and command-line goals for which no rule was found. In our reference build, you can see an existence job in the top-level make for the makefile itself, as well as one for the file prog2, which has no rule in the makefile.


remake

Remake jobs are at the center of emake’s emulation of GNU make makefile remaking feature. In a remake job, emake checks every makefile that was read during the parse job for two things:

  1. Is there a rule to rebuild the makefile; and
  2. Was the makefile actually rebuilt (because it was out-of-date).

If any makefile was rebuilt, then emake restarts the make instance — all the way back to the parse job.

Because makefile remaking is a gmake-specific feature, you will only see remake jobs when emake is emulating gmake — not when it’s emulating NMAKE.

One last note about remake jobs: emake overloads the use of the <failed> tag inside a remake job to indicate not failure, but whether or not the job determined that the make instance should be restarted. If yes, then the remake job will include a <failed> tag with the code attribute set to 1; if not, then the remake job will have no <failed> tag.


rule

Rule jobs are the real workhorses of a build. Each rule job encapsulates the commands needed to update one target in the build — literally the body of a rule from a makefile — and a rule always has an associated output target (or targets), which is identified by the name attribute of the job tag in annotation. In our reference build, the rule job in the toplevel make instance corresponds to this rule in the makefile:

prog: 
        @$(MAKE) sub/prog
        @cp sub/prog ./prog

If you look at the annotation for the job you’ll notice that only $(MAKE) sub/prog is actually in the rule job — because the cp sub/prog ./prog was automatically split into a continuation job when emake detected the recursive make invocation.


follow

A follow job serves two purposes. First, it is a connection point that allows emake to tie the end job of a recursive make invocation back into the ordered list of jobs in the parent make. Second, it is the means by which emake propagates the error status of a recursive make to the parent make.

Follow jobs always have an associated rule or continuation job, identified by the partof attribute on the job tag in annotation. That is the job that spawned the recursive make which the follow job is associated with. Of course, not all rule and continuation jobs have an associated follow job — follow jobs only show up when a recursive make was invoked.


continuation

A continuation job represents the “leftover” commands that follow a recursive make invocation in a rule (or continuation) job. In our reference build, emake creates a continuation job for cp sub/prog ./prog, because that command follows a recursive make invocation. Continuation jobs, like follow jobs, are always associated with a rule (or another continuation) job, identified by the partof attribute on the job tag in the annotation.

The reason for splitting the extra commands into a separate job is simple: that allows emake to easily specify the serial order of those commands relative to the jobs in the recursive make — the commands in the continuation come after the jobs in the recursive make.


end

The last job in every make instance (and therefore the last job of every emake build) is an end job. End jobs exist primarily to handle end-of-the-make cleanup, such as removing intermediate targets or temporary inline files that were created while executing the other jobs in the make.


(Not pictured) subbuild

Subbuild jobs, which were not used in the reference build, are part of emake’s subbuild feature. They are simply a mechanism to inject a recursive make invocation into the build, just as you would get if you had a rule job with a $(MAKE) command, but without the tedium of actually running that command.

Electrify jobs

Electrify uses much of the same underlying machinery that emake does, but it has its own set of job types for its particular needs.


alpha

Every electrify build starts with an alpha job, which is a trivial placeholder marking the start of the list of jobs.


external

External jobs represent commands invoked by the electrified build tool which are distributed to the cluster. These are similar to emake’s rule jobs, except they will only ever run a single command, while rule jobs may run any number of commands.


update

Update jobs represent filesystem modifications made by commands invoked by the electrified build tool but not distributed to the cluster. These modifications are detected with the aid of electrifymon.


omega

Every electrify build ends with an omega job, which, like the alpha job, is a trivial placeholder.

Conclusion

I hope you’ve found this post informative. Questions or feedback? Please feel free to comment below!

Why is SCons so slow?

UPDATE: If you’re coming from Why SCons is not slow, you should read my response

A while back, I did a series of posts exploring the performance of SCons on builds of various sizes. The results were dismal: SCons demonstrated a classic O(n2) growth in runtime, meaning that the length of the build grew in proportion to the square of the number of files in the build, rather than linearly as one would hope. Naturally, that investigation and its results provoked a great deal of discussion at the time and since. Typically, SCons advocates fall back on one particular argument: “Sure, SCons may be slow,” they say, “but that’s the price you pay for a correct build.” Recently, Eric S. Raymond wrote an article espousing this same fundamental argument, with the addition of some algorithmic analysis intended to prove mathematically that a correct build, regardless of the build tool, must necessarily exhibit O(n2) behavior — a clever bit of circular logic, because it implies that any build tool that does not have such abyssmal performance must not produce correct builds!

Naturally, after spending nearly a decade developing a high-performance replacement for GNU make, I couldn’t let that statement stand. This post is probably going to be on the long side, so here’s the tl;dr summary:

  • You can guarantee correct builds with make, provided you follow best practices.
  • The worst-case runtime of any build tool if, of course, O(n2), but most, if not all, builds can be handled in O(n) time, without sacrificing correctness.
  • SCons’ performance problem is caused by design and implementation decisions in SCons, not some pathology of build structure.

What is required to ensure a correct build?

One of the fundamental tenents of the pro-SCons mythos is the idea that it is unique in its ability to guarantee correct builds. In reality, SCons is not doing anything particularly special in this regard. It’s true that by virtue of its design SCons makes it easier to get it right, but there’s nothing keeping you from enjoying the same assurances in make.

First: what is a correct build? Simply put, a correct build is one in which everything that ought to be built, is built. Note that by definition, a from-scratch build is correct, since everything is built in that case. So the question of “correct” or “incorrect” is really only relevant in regards to incremental builds.

So, what do we need in order to ensure a correct incremental build? Only three things, actually:

  1. A single, full-build dependency graph.
  2. Complete dependency information for every generated file.
  3. A reliable way to determine if a file is up-to-date relative to its inputs.

What SCons has done is made it more-or-less impossible, by design, to not have these three things. There is no concept like recursive make in the SCons world, so the only option is a single, full-build dependency graph. Likewise, SCons automatically scans input files in several programming languages to find dependency information. Finally, SCons uses MD5 checksums for the up-to-date check, which is a pretty darn reliable way to verify whether a given file needs to be rebuilt.

But the truth is, you can guarantee correct builds with make — you just have to adhere to long-standing best practices for make. First, you have to avoid using recursive make. Then, you need to add automatic dependency generation. The only thing that’s a little tricky is the up-to-date check: make is hardwired to use file timestamps, which can be spoofed, deliberately or accidentally — although to be fair, in most cases, timestamps are perfectly adequate. But even here, there’s a way out. You can use a smarter version of make that has a more sophisticated up-to-date mechanism, like ElectricMake or ClearMake. You can even shoehorn MD5 checksums into GNU make, if you like.

I can’t deny that SCons has made it easier to get correct builds. But the notion that it can’t be done with make is simply absurd.

What is the cost of a correct build?

Now we turn to the question of the cost of ensuring correctness. At its core, any build tool is just a collection of graph algorithms — first contructing the dependency graph, then traversing it to find and update out-of-date files. These algorithms have well-understood complexity, typically given as O(n + e), where n is the number of nodes in the graph, and e is the number of edges. It turns out that e is actually the dominant factor here, since it is at least equal to n, and at worst as much as n2. That means we can simplify the complexity to O(n + n2), or just O(n2).

Does this absolve SCons of its performance sins? Unfortunately it does not, because O(n2) is a worst-case bound — you should only expect O(n2) behavior if you’ve got a build that has dependencies between every pair of files. Think about that for a second. A dependency between every. pair. of. files. Here’s what that would look like in makefile syntax:

all: foo bar foo.c bar.c foo.h bar.h
foo:     bar foo.c bar.c foo.h bar.h
bar:         foo.c bar.c foo.h bar.h
foo.c:             bar.c foo.h bar.h
bar.c:                   foo.h bar.h
foo.h:                         bar.h

It’s ridiculous, right? I don’t know about you, but I’ve certainly never seen a build that does anything even remotely like that. In particular, the builds I used in my benchmarks don’t look like that. Fortunately, those builds are small and simple enough that we can directly count the number of edges in the dependency graph. For example, the smallest build in my tests consisted of:

2,000 C sources
+ 2,004 headers
+ 2,000 objects
+ 101 libraries
+ 100 executables

6,205 total files

So we have about 6,000 nodes in the graph, but how many edges does the graph contain? Lucky for us, SCons will print the complete dependency graph if we invoke it with scons –tree=all:

+-.
  +-SConstruct
  +-d1_0
  | +-d1_0/SConstruct
  | +-d1_0/f00000_sconsbld_d1_0
  | | +-d1_0/f00000_sconsbld_d1_0.o
  | | | +-d1_0/f00000_sconsbld_d1_0.c
  | | | +-d1_0/lup001_sconsbld_d1_0/f00000_sconsbld_d1_0.h
  ...

The raw listing contains about 35,000 lines of text, but that includes duplicates and non-dependency information like filesystem structure. Filter that stuff out and you can see the graph contains only about 12,000 dependencies. That’s a far cry from the 1,800,000 or so you would expect if this truly were a “worst-case” build. It’s clear, in fact, that the number of edges is best described as O(n).

Although I don’t know how (or even if it’s possible) to prove that this is the general case, it does make a certain intuitive sense: far from being strongly-connected, most of the nodes in a build dependency graph have just one or two edges. Each C source file, for example, has just one outgoing edge, to the object file generated from that source. Each object file has just one outgoing edge too, to the library or executable the object is part of. Sure, libraries and headers probably have more edges, since they are used by multiple executables or objects, but the majority of the stuff in the graph is going to fall into the “small handful of edges” category.

Now, here’s the $64,000 question: if the algorithms in a build tool scale in proportion to the number of edges in the dependency graph, and we’ve just shown that the dependency graph in question has O(n) edges, why does SCons use O(n2) time to execute the build?

Why is SCons so slow?

SCons’ O(n2) performance stems from its graph traversal implementation. Essentially, SCons scans the entire dependency graph each time it is looking for a file to update. n scans of a graph with O(n) nodes and edges equals an O(n2) graph traversal. There’s no mystery here. In fact, the SCons developers are clearly aware of this deficiency, as described on their wiki:

It’s worth noting that the Jobs module calls the Taskmaster once for each node to be processed (i.e., it’s O(n)) and the Taskmaster has an amortized performance of O(n) each time it’s called. Thus, the overall time is O(n^2).

But despite recognizing this flaw, they severely misjudged its impact, because they go on to state that it requires a “pathological” dependency graph in order to elicit this worst-case behavior from SCons. As we’ve shown here and in previous posts, even a terribly mundane dependency graph elicits O(n2) behavior from SCons. I shudder to think what SCons would do with a truly pathological dependency graph!

Obviously the next question is: why does SCons do this? That’s not quite as easy for me to explain, as an outside observer. To the best of my understanding, they rescan the graph just in case new dependencies are added to the dependency graph while evaluating a node in the graph — remember, in SCons the commands to update a file are expressed in Python, so they can easily manipulate the dependency graph even while the build is running.

Is it really necessary to rescan the dependency graph over and over? I don’t think so. In fact, make is proof that it is not necessary. I think there are two ways that SCons could address this problem: first, it could adopt GNU make’s convention of partitioning the build into distinct phases, one that updates dependency information, and a second that actually executes the build. In GNU make, that strategy allows for the introduction of new dependency information, while imposing only a one-time O(n) cost for restarting the make process if any new dependencies are found.

Alternatively, SCons could probably be made smarter about when a full rescan is required. Most of the time, even if new dependencies are added to the graph, they are added to the node being evaluated, not to nodes that were already visited. That is, when you scan a source file for implicit dependencies, you find the dependencies for that file not for other files in the build (duh). So most of the time, a full rescan is massive overkill.

The final word…?

Hopefully this is my last post on the subject of SCons performance. It is clear to me that SCons does not scale to large projects, and that the problem stems from design and implementation decisions in SCons, rather than some pathology in the build itself. You can get comparable guarantees of correctness from make, if you’re willing to invest the time to do things the right way. The payoff is a build system that is not only correct but has vastly better performance than SCons as your project grows. Why wouldn’t you want that?

Makefile hacks: print the value of any variable

One of my favorite makefile debugging tricks is this rule for printing out the value of a variable:

print-%:
        @echo '$*=$($*)'

Throw this into a GNU make makefile and then print any make variable you like by invoking targets like print-MAKE_VERSION:

ericm@chester:/tmp$ gmake print-MAKE_VERSION
MAKE_VERSION=3.81

You can imagine how handy this is when diagnosing issues with your makefiles. Here’s how it works:

  1. print-% defines a pattern rule that matches any target that starts with the characters print-.
  2. In the context of a pattern rule, the $* variable expands to the stem of the target, that part which matched the % in the pattern. In my example above, that corresponds to MAKE_VERSION.
  3. GNU make variable expansion rules allow for variable references inside variable names, so $($*) expands first to $(MAKE_VERSION), and finally to the value of the MAKE_VERSION variable.

Makefile injection with -f

The print-% rule is a slick hack, but it’s a nuisance to have to modify a makefile just to use it. Worse, you might not even be able to modify the makefile. Fortunately, there’s a solution: the -f command-line option. You’re probably familiar with it — that’s how you tell gmake to use a different makefile than the default Makefile when it starts. For example, if you have a makefile named build.mak:

gmake -f build.mak

What you may not know is that you can use multiple -f options on the command line. GNU make will read each file in turn, incorporating the contents of each just as if they were included with the include directive. We can create a simple makefile called printvar.mak containing nothing but our print-% rule, then inject it into any makefile we want like this:

gmake -f printvar.mak -f Makefile print-MAKE_VERSION

A shell script to save typing

The combination of the print-% rule and the -f command-line option is powerful, but it’s unwieldy — too many characters to type. The solution is a shell script wrapper:

#!/bin/bash

filename=""
if [ -f GNUmakefile ] ; then
  filename="GNUmakefile"
elif [ -f makefile ] ; then
  filename="makefile"
elif [ -f Makefile ] ; then
  filename="Makefile"
fi
if [ -n "$filename" ] ; then
  vars=""
  for n in $@ ; do
    vars="$vars print-$n"
  done
  gmake -f $filename -f printvar.mak $vars
else
  echo "No makefile found" 1>&2
  exit 1
fi

Save that in a file called printvars somewhere on your PATH and you can do things like this:

ericm@chester:/tmp$ printvars MAKE_VERSION COMPILE.cc
MAKE_VERSION=3.81
COMPILE.cc=g++    -c

Advanced make variable diagnostics

Beyond simply printing the value of a variable, GNU make 3.81 has three built-in functions that allow introspection on variables, which you can add to the print-% rule for additional diagnostics.

First is the $(origin) function, which tells you how a variable was defined. For example, if a variable FOO was inherited from the environment, $(origin FOO) will give the result environment. Variables defined in a makefile will give the result file, and so forth.

Next is the $(flavor) function, which tells you the flavor of the variable, either simple or recursive.

Finally is the $(value) function, which gives you the unexpanded value of the variable. For example, if you have variables like this:

FOO=123
BAR=$(FOO)

$(value BAR) will give the result $(FOO), rather than the fully-expanded 123 that you might expect.

With these additions, the print-% rule now looks like this:

print-%:
	@echo '$*=$($*)'
	@echo '  origin = $(origin $*)'
	@echo '  flavor = $(flavor $*)'
	@echo '   value = $(value  $*)'

And here’s how it looks in action:

ericm@chester:/tmp$ printvars MAKE_VERSION COMPILE.cc
MAKE_VERSION=3.81
  origin = default
  flavor = simple
   value = 3.81
COMPILE.cc=g++    -c
  origin = default
  flavor = recursive
   value = $(CXX) $(CXXFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c

Shell commands in GNU make

For new users, the relationship between make and the shell can be confusing. I think people get thrown off by the make-specific syntax in makefiles — all those colons and at signs and percents. But the truth is that most of the content in a makefile is commands that are executed by the shell.

With GNU make, there are two ways to invoke shell commands from a makefile:

Recipes: the go-to-guy of shell commands in make

The recipe of a rule is the workhorse of gmake/shell integration. Structurally, the recipe is the list of commands used to generate the output file — each of the tab-initiated lines following a target: prereq declaration in the makefile. In the following makefile fragment, I highlighted the recipe in red (note that the commands shown here are for illustration only; in a real makefile, you should use variables like $(CC), $@ and $< to ensure the makefile is portable and flexible):

foo.o: foo.c foo.h
echo 'building foo.o' gcc -c -o foo.o foo.c echo 'done with foo.o'

You can think of the commands in a recipe as being invoked using the common idiom sh -c “command”. That means that you can use standard shell constructs like process pipelines and for loops. In turn, that flexibility means that recipes should be your “go-to guy” when it comes to invoking shell commands from a makefile. Want to preprocess your sources with sed before sending it to the compiler? Just tweak your recipe:

foo.o: foo.c foo.h echo 'building foo.o' sed -e 's/foo/bar/g' foo.c | gcc -c -o foo.o -xc - echo 'done with foo.o'

So recipes are the primary way to invoke shell commands in make. Here are some guidelines to remember:

If possible, gmake will invoke commands directly rather using the shell.

Essentially, gmake scans the command-line for shell built-ins (like for and if) and “shell special characters” (like | and &). If none of these are present in the command-line, gmake will
avoid the overhead of the shell invocation by invoking the command directly (literally just using execve to run the command).

Note that if you change the shell gmake uses by setting the SHELL makefile variable, then gmake will always use the shell to invoke commands, since it can’t know what commands and characters are “special” to your custom shell.

gmake expands command-lines before executing them.

Command expansion is why you can use gmake features like variables (eg, $@) and functions (eg, $(foreach)) in the recipe. It is also why you must use double dollar signs if you want to reference shell variables in your recipe:

abc: def let foo=1 ; echo $$foo

gmake executes each line in a recipe separately.

That means that there’s no sharing of state from one command to the next, and it’s why recipes like the following don’t work as expected:

abc: def
let foo=1 echo $$foo

Because this recipe contains two lines, gmake executes it in two pieces:

sh -c "let foo=1" sh -c "echo $foo"

It’s obvious why this recipe doesn’t work, when written out that way: the variable assignment occurs in one shell, but the reference occurs in another. But there’s an easy way around this: line continuations. You’ve probably seen this technique in use:

abc: def
let foo=1 ; \ echo $$foo

Now gmake executes the recipe using a single shell invocation:

sh -c "let foo=1 ; echo $foo"

Nota bene: if you are using gmake 3.82 or later, you can enable the .ONESHELL feature, which causes gmake to invoke the entire recipe using a single shell invocation, even if you haven’t used line continuations.

The $(shell) function

The $(shell) function is the second way to invoke the shell from gmake. It’s intended purpose is to capture the output of a command into a gmake variable. For example, you could save the name of the current user in the variable USERNAME this way:

USERNAME := $(shell whoami)

$(shell) takes a single argument, the command to run. Just like commands in recipes, if there are shell constructs in the command gmake will invoke it using sh -c “command”; otherwise, gmake will invoke the command directly. Likewise, gmake will expand variable and function references in the command before invoking it, so you must use double-dollar-signs to reference shell variables in that context:

TARGETS := $(shell for n in `seq -w 1 10`; do echo $$n; done)

Here are some guidelines to help you use $(shell) correctly and effectively:

If you’re not capturing the result of $(shell) to a variable, you’re probably misusing $(shell).

Here’s a real-world example of how not to use $(shell):

$(shell touch targets.mk) include targets.mk all: $(TARGETS)

The intent here was to ensure that targets.mk exists before gmake tries to include it. One problem with this approach is that it will touch the file every time you invoke the makefile, even on a “no touch” build! The correct way to accomplish this is with a proper rule for targets.mk:

include targets.mk all: $(TARGETS) targets.mk: @touch targets.mk

If targets.mk doesn’t exist, it will be created. Note that this particular example exploits the makefile remaking feature in gmake; in general though, if you’re using $(shell) this way, you can probably transform that usage into a regular rule, and get better performance and a more robust makefile for your trouble.

If you’re using $(shell) in a recipe, you’re probably misusing $(shell).

Another real-world example of $(shell) abuse:

foo.o: foo.c $(shell sed -e 's/foo/bar/' $< | gcc -o $@ -xc -)

Now that you know that recipes are implicitly using the shell, you can see that this use of $(shell) is utterly superfluous. The problem with this is that it moves the work into the command expansion phase, which means it can’t run in parallel with gmake. The fix for this one is to just drop the $(shell) call:

foo.o: foo.c sed -e 's/foo/bar/' $< | gcc -o $@ -xc -

Always use := assignment with $(shell).

I’ve written before about the importance of using := assignment with $(shell). In short: not using := assignment can cause your makefile to invoke the shell far more often than you realize, which can be a performance problem, and leave you with unpredictable build results. Always use := assignment with $(shell).

Conclusion

Hopefully now you see that the relationship between gmake and the shell is not so mysterious after all. Just remember: when in doubt, use a recipe, and don’t use $(shell) unless you’re capturing the result into a variable.