post

What’s new in ElectricAccelerator 9.1?

We recently released ElectricAccelerator 9.1, the 33rd feature release in the product’s 15-year-and-counting development history. This release includes several enhancements which I’m pleased to share with you: a new look-and-feel, improved scalability, and a new flexible licensing system to accommodate small- and medium-sized teams! Read on for more details.

Cluster Manager Dashboard

The most visible change in 9.1 is the all-new Cluster Manager dashboard, which collects several pieces of information about the health and performance of your cluster in what we hope will be a one-stop-shop for cluster monitoring. We tried to pack in a lot of usable data, while maintaining the clean look-and-feel that is the hallmark of the new Cluster Manager interface:

dashboard_full

The top of the dashboard will look familiar if you ever tried ElectricAccelerator Huddle, where the metrics proved so popular with users that we decided to surface them in the standard Accelerator UI as well. Across the top of the page you’ll find the following information:

Agents The total number of agents in the cluster. If any are offline, a warning icon is shown next to the count. Clicking the icon will show you the bad agents.
Running builds Number of builds currently in progress.
CPU Hours Used The total CPU time used by all builds ever run on the cluster. For example, a build that used 10 agents for 1 hour used a total of 10 CPU-hours.
Developer Hours Saved The total time saved by using Accelerator. For example, if your build takes 10 hours when run serially but just 1 hour with Accelerator, you save 9 hours each time you run that build.
Days Remaining Number of days until your license expires — so you know when to renew.

Below the row of metrics the dashboard is in two columns. On the left you’ll find these sections:

Welcome: a brief description of the major new features in the release you’re using, as well as information about new releases if you’re not running the latest version.

Online Resources: links to sources of help like the ElectricAccelerator Knowledge Base and Ask Electric Cloud, our community Q&A site.

Lightning Lessons: short tutorials and demos to help new users get started using ElectricAccelerator to crush build times.

Finally, in the right-hand column of the dashboard you’ll find some preset reports:

Agent Usage: this graph shows agent availability and demand over the past 24 hours, so you can quickly see if usage has exceeded capacity, indicating that you need to expand your cluster.

Build Duration: here you’ll see the duration of every build run in the past 24 hours, colored according to build class, so you can easily spot aberations. Clicking any of the data points will take you to the details page for that build.

Clean, Modern Cluster Manager Interface

As excited as I am about the new Cluster Manager dashboard, the user interface updates don’t end there. We’ve overhauled the entire CM UI, the first complete overhaul since version 4.0 in 2007. With this release the UI has a modernized look-and-feel, and uses the same visual design elements as ElectricFlow — so we have a consistent design language across Electric Cloud’s suite of products. Functionally the UI is not much changed, although filters are a bit more flexible and easier to use. Rather than belabor the point, take a look at these screenshots of the Builds and Agents pages:

Builds

Agents

Back-end Updates: Java 8 and 64-bit

Accelerator 9.1 includes Cluster Manager improvements under-the-hood as well. First off we attended to some long overdue maintenance by updating from Java 6 to Java 8. This required us to update many of the third-party libraries upon which the Cluster Manager is built, which in turn prompted a variety of source code changes to account for changes in APIs — affecting about 26% of the Java classes in our implementation. For now the primary benefit of this work is improved stability and reliability as we pulled in fixes in those third-party libraries. But in future releases, the groundwork we’ve done in 9.1 will enable us to take advantage of modern language features in Java 8, and to use new third-party Java integrations that have been introduced in the past few years.

The other major back-end change for the Cluster Manager is that it now runs on top of a 64-bit JVM. This enables the CM to more easily manage the large, busy clusters that some users wish to deploy — thousands of agents with hundreds of concurrent builds, with tens of millions of builds executed over the lifetime of the deployment.

Licensing updates

Finally, Accelerator 9.1 includes some changes to the way the product is licensed based on our experience with ElectricAccelerator Huddle, the freemium/low-end version of Accelerator that’s been in public beta for a few years. For small-to-medium-sized teams, Accelerator can be licensed by number of agents and number of concurrent builds, at a price point that I think users will find very reasonable (unfortunately I can’t disclose specific numbers here).

In addition, management of so-called “local agents” has been drastically simplified, again based on our experience with Huddle. To put it simply: local agents — any agent that is running on the same host as emake itself during a build — are now managed via the Cluster Manager, just like any other agent in the cluster. Both the CM and emake will prefer to allocate and use local agents when possible, as these tend to give better performance by avoiding network overhead.

Availability

ElectricAccelerator 9.1 is available immediately for current users via the Electric Cloud ShareFile site. For new users, contact sales@electric-cloud.com for a demo or eval download. Upgrading is recommended for all users.

As always, this release would not have been possible without the outstanding efforts of the ElectricAccelerator Engineering team at Electric Cloud. Thank you all for your contribution!

post

What factors affect build performance?

Recently a customer asked me to help them create a list of factors that affect build performance. They found themselves often tasked with explaining to their developers why one build had worse performance than another, or with finding ways to further improve the performance of a build. This is a very big, very complex question — I think perhaps much more so than they realized at first! In fact I think the question as posed is fundamentally unanswerable: I could never give an exhaustive list of the factors that affect build performance. There are too many, and there are surely some that I myself have yet to see — “unknown unknowns” as they say.

Nevertheless, there is value in making a list, even if incomplete, if for no other reason then to serve as a reference for people trying to understand or improve the performance of their builds. What follows is my attempt at creating that list, roughly in order of importance — but bear in mind that this ordering is somewhat subjective, and highly situation-dependent: your mileage may vary, and different builds will have different specific bottlenecks.

Factors that affect build performance

  1. Build size

    Builds can be measured in many ways: number of output targets, number of input files, total lines of code, aggregate bytes of output generated, etc. Generally speaking, the bigger the build is, the longer it will take to complete. If your build is long simply because of its size, you may think you have no opportunities, but that’s not so: parallel builds, caching outputs, componentization and beefier hardware can all help cope with this type of problem.

  2. Execution parallelism

    Most build tools support some form of parallel execution — GNU make’s -j option is the classic example. Assuming there is parallelizable work in the build, the build that runs on more cores (that is, with a higher -j value) will complete more quickly.

  3. Available parallelism / build structure

    Running a build on many cores only helps if there is exploitable parallelism in the build. If the build is defined in such a way that parallelism is limited, then it will take longer to complete regardless of the -j value. For example, some builds may have unnecessary serializations in the dependency graph, which will limit performance.

  4. Caching

    The use of (or failure to use) caching technology, such as ElectricAccelerator’s JobCache, ClearCase winkins, or ccache, can dramatically impact build performance. In my tests caching such as JobCache can reduce build duration by 50% or more for full builds.

  5. Conflicts (ElectricAccelerator only)

    For builds executed with ElectricAccelerator, conflicts can have a significant impact on performance. Briefly, an Accelerator conflict is any time your build “loses” a race condition between two steps that should have been serialized but weren’t due to missing dependencies in your makefile or build files. Accelerator can detect and correct such errors on-the-fly, but it comes at a cost. A few conflicts is usually not a problem, but if you have hundreds or thousands, it will make your build slower as Accelerator reruns portions of the build to get the correct results. Usually if you see such a scenario it’s a sign that you didn’t have a complete Accelerator history file for your build, so fixing such issues is as easy as using the history file generated by that build to augment the dependency information in future runs of the build.

  6. Code complexity and structure

    There are many attributes of the code itself which can affect build performance. For example, as a general rule, very long source files will take longer to compile than shorter files. Files containing very long individual functions will take longer to compile than files with only short functions. Heavy use of templates in C++ will cause slower compilation. Careless use of #include statements in C or C++ code will cause slower compilation and can be especially harmful to incremental builds by triggering excessive recompilation.

  7. Implementation language

    Some languages are easier to process and therefore faster to compile. In general, C++ is slower to compile, while languages like Java or Go are very quick to compile. Some languages require no compilation at all, so builds of code using such languages can be very very fast indeed!

  8. Build tool

    There are a staggering array of build tools which you might choose to drive your build: GNU make, ninja, maven, ant, scons, emake, tup and more. Some were designed for high performance on full builds, while others were designed for high performance on incremental builds, and still others were designed for ease-of-use, correctness, or other non-performance related attributes. The choice of build tool will affect the performance of your build, especially if your build is very large.

  9. Compiler

    For compiled languages like C and C++ there are often many different compilers that you could choose from for your build: gcc/g++, clang, icc, WindRiver, Microsoft cl, tcc, etc. These tools themselves have different performance profiles, and the performance may even vary from one version of the compiler to the next.

  10. Compiler options

    For a given compiler, the build options you enable may significantly affect the compile time. For example, when using gcc, building with -O3 is generally slower than building with -O0. Therefore for developer builds, you may consider to disable optimizations in order to reduce build cycle time. Other options that may influence compile speed include: pre-compiled headers (PCH); dependency generation (-MD, -MMD, -MF, etc); profiling or coverage analysis (-fprofile-arcs or -ftest-coverage); and include path definitions (-I), which if very long can cause the compiler to spend excessive time searching for header files.

  11. Linker

    As with the compiler, different linkers have different performance characteristics. For C/C++ compilation on Linux the default linker is GNU ld, but there are alternatives like Google gold which have much better performance, albeit for a subset of the use cases supported by GNU ld. If your use case is supported by gold, you will likely see much better build performance by switching.

  12. Memory

    As with any process involving computers, the amount of available memory will have a significant impact on build performance. Too little and your system will swap excessively. Fortunately there’s no such thing as “too much”, though it may be prohibitively expensive to get so much RAM that you can stop worrying about it. In practice most builds do not require a huge amount of memory, but if yours do, and you don’t have enough, your build speed will suffer.

  13. Disk performance

    Like memory, the performance of your disk can significantly affect build performance. In fact its easier in some ways to understand the impact of disk speed. If the build generates 10GB of output and your disk can only write at 10MB/s, the fastest that the build can possibly finish is about 1,000 seconds, or nearly 20 minutes. On the other hand, if the build generates only 5MB of output and uses the same disk, then only 1/2 second is needed to write the build outputs, so the disk is unlikely to be a bottleneck. You may find that the disk is adequate for your builds now, but as the build gets bigger you will reach a point where the disk is no longer fast enough. At that point you can upgrade to a faster disk, and that will be sufficient for some time until again your build grows to exceed the capacity of the disk.

    Even if your disk is not a primary bottleneck now, switching to a faster disk may improve performance somewhat. Many users have had good results from switching to SSD for temporary storage, or using striped RAID for those builds that generate truly enormous amounts of data.

  14. Network performance

    For distributed builds such as those executed with ElectricAccelerator, network performance is crucial because build data has to be transferred across the network. But even if the build itself is not distributed, it may make use of tools pulled from a network file share, so the network performance can affect the build.

  15. Operating system / kernel version

    Some operating systems have better performance for builds than others — in general, I’ve found builds on Linux to be relatively faster than builds on Windows, for example. Likewise, some versions of the operating system may be faster than others. Some users have reported as much as a 3x improvement by upgrading from an old version of Linux to a newer version due to optimizations in the kernel itself.

  16. Anti-virus software

    Use of anti-virus software can dramatically impact build performance, particularly if the A/V is configured in one of the more aggressive or intrusive modes of operation: sometimes every file operation is intercepted by the A/V scanner, adding a substantial drag on build speed.

  17. License management

    Some build tools, such as certain commercial compilers, require licenses in order to operate. If the license system is misconfigured it can add delays to the build process, sometimes causing each compile to take minutes instead of seconds as the compiler tries and fails to contact a license server, or contacts the wrong license server instance (for example, one on a different subnet).

A foundation for performance investigations

So there you have it: my (not entirely) comprehensive list of factors that can affect build performance. Of course these won’t all be relevant for every build: every build is different, and each has a unique performance profile. A slow disk may be mostly irrelevant for one build but absolutely critical for another. My hope is that this list will serve as a foundation for your build performance investigations — something to help get you started, even if it doesn’t get you all the way to a conclusion.

What do you think of my list? What would you add, and how would you change the ordering? Let me know in the comments below.

post

What’s new in ElectricAccelerator 9.0?

Just a couple months ago, in October 2016, we released ElectricAccelerator 9.0. This version includes some really exciting new functionality and unlocks even more amazing performance than ever before. For the first time since 2008 we added support for a new build tool: ninja, an ultra-fast new make-like build tool and the workhorse at the center of the build for both chromium and Android (yes, that Android). And we’ve continued to expand the JobCache feature — a generalization of the parse avoidance feature introduced in Accelerator 7.0. With Accelerator 9.0 you can cache more types of work, including GCC/G++ compiles, clang compiles, Microsoft cl compiles, javac and javadoc, and Google’s new Jack compiler for Java code. Even better, you can share cached results with other developers to amplify the gains across an entire team. Read on for details.

Ninja emulation

Accelerator 9.0 introduces support for the ninja-based builds. Ninja is a very interesting build tool: conceptually similar to make, but radically simplified (at least so far!). Gone are things like built-in functions, pattern rules, vpath, conditional directives, and all the other things that make it hard to parse and evaluate makefiles quickly. This enables the ninja parser to evaluate “ninja files” unbelievably quickly, but at the cost of making ninja files verbose and ill-suited for creation by hand. Instead, ninja files are typically generated from some other process, such as CMake. The benefit to the end user then is extremely fast incremental builds: for example, in Android 6.0, using the original make-based build system, a no-touch build could take as much as a minute to run even though there’s no work to be done. In Android 7.0, using the new ninja-based build system, the same build can be completed in about 5 seconds!

ElectricAccelerator’s emulation of ninja is, I think, remarkably anticlimactic: to execute a ninja build, simply invoke emake –emake-emulation=ninja. That’s it. Here’s a very simple “Hello, world!” ninja file:

1
2
3
4
rule echo
description = Building $out
command = echo "Hello, world!"
build foo: echo

And the result of running this with emake –emake-emulation=ninja:

1
2
3
4
5
6
$ emake --emake-emulation=ninja
Starting build: local-32601
Building foo
Hello, world!
Finished build: local-32601 Duration: 0:00 (m:s)
$

As I said, it’s utterly uninteresting, which, quixotically, makes it very interesting: the integration is seamless and it “just works”. Even better, by running your ninja build with ElectricAccelerator you automatically and instantly take advantage of all the advanced acceleration and correctness features you’ve come to love about Accelerator: conflict detection, history, schedule optimization, annotation, even jobcache. It all just works.

JobCache Enhancements

In Accelerator 7.0 we introduced parse avoidance, a mechanism for caching the result of makefile parsing in one build in order to accelerate subsequent builds. Once we had shown that this type of caching could dramatically improve build performance we refactored the code behind parse avoidance to create a general purpose caching framework dubbed JobCache and in subsequent releases we’ve steadily expanded the types of work to which jobcache can be applied:

  • Accelerator 7.1: jobcache for Javadoc generation
  • Accelerator 8.0: jobcache for C/C++ compiles using clang/gcc/g++ (comparable to, but better than, ccache)
  • Accelerator 8.1: jobcache for C/C++ compiles using Microsoft cl

In Accelerator 9.0 we’ve expanded the reach of jobcache in two ways. First, we added support for caching javac and Jack compiles. Next, we added shared jobcache, which enables a team of developers to leverage jobcache collectively and reliably, eliminating redundant work across the entire team.

With shared jobcache, the team designates a “blessed” or “golden” build process to populate the cache — typically the nightly or continuous integration builds. This build simply uses jobcache as normal, using –emake-assetdir to specify a location on a shared filesystem to host the cache. Then, each developer explicitly requests to use the shared cache by adding –emake-shared-assetdir to the command-line when they invoke emake, specifying the same location. Once enabled, emake uses both the shared cache and the private cache during the build. For each job that uses jobcache:

  1. Check the shared jobcache for a matching entry.
    1. If a match is found in the shared jobcache, use it. Done!
    2. If a match is not found in the shared jobcache, continue.
  2. Check the private jobcache for a matching entry.
    1. If a match is found in the private jobcache, use it. Done!
    2. If a match is not found in the private jobcache, continue.
  3. Run the job as normal
  4. Save the result to the private cache

Note that the shared cache is never written to by the developers’ builds: updates are only saved in the private cache. In this way we can ensure that developers’ builds to not litter the shared cache with one-off or user-specific cache entries. Typically we expect that developers will see very good cache hit rates against the shared cache, perhaps 95% or better, since each developer modifies only a small fraction of the total source code at once. Thus shared jobcache multiples the savings from jobcache by the size of the team.

Dynamic file patching

The final feature of interest in Accelerator 9.0 is dynamic file patching. This is a mechanism by which emake can patch files on the fly as they are referenced during the build, based on the name, size and MD5 checksum of the original. This feature enables users to tweak build scripts or makefiles in order to improve performance or compatibility with Accelerator — critical in environments where there is limited ability to modify the original files directly.

Looking forward to 9.1

Accelerator 9.0 contains some really tremendous new features: the first new build tool emulation in almost a decade; shared jobcache; on-the-fly patching for those challenging environments where no other option will do. But as always, my eye is already on the next horizon: Accelerator 9.1. We have some big plans relating to performance and ease-of-use. It will require a lot of hard work but I think we have the right team to do it. Stay tuned.

Accelerator 9.0 is available immediately for existing customers — support@electric-cloud.com to get the bits. New users can download ElectricAccelerator Huddle to take it for a test drive, or contact sales@electric-cloud.com for an evaluation of the enterprise edition.

post

What’s new in GNU make 4.2?

In May 2016 the GNU make team released GNU make 4.2. I’m pleased to see another release, though I find myself underwhelmed by both the timeline and the content of this release. When 4.1 came out just one year after 4.0 I hoped it was a sign that the GNU make project was switching to a more frequent and regular release cycle, as many software projects have done in the last several years. Although it can be a difficult adjustment this release cadence can have significant benefits like improving user engagement and reducing risk. But with the 4.2 release arriving nineteen long months after 4.1 it seems that GNU make has failed to make the transition.

Of course infrequent releases are not necessarily a problem, as long as the releases contain compelling new functionality. Unfortunately the new features in GNU make 4.2 are charitably described as “uninspiring” — though I’m sure each enhancement will be handy for the corner case it was designed to address. Of course GNU make is a mature project by any definition, and frankly it does what it does pretty well and has for a very long time — maybe it’s just “done”. But consider this: the past few years has seen something of an explosion in the build tool space, with several new build tools cropping up. Each of the following alternative build tools has had multiple releases in the last year, and each has innovative features that could be adopted by GNU make:

  • gradle, the default build tool for Android apps. Monthly releases. Reports and notifications.
  • bazel, the open-source version of Google’s internal build system. Ten releases already in 2016. Checksum-based up-to-date checks and minimization of test suite executions.
  • ninja, a make-like build tool. Two releases in the last twelve months. Resource pools and unbelievably fast parsing / low overhead.

So, what does GNU make 4.2 have to offer? Read on to see, and let me know in the comments if you disagree with my analysis.

.SHELLSTATUS variable

GNU make has had the $(shell) function for many years. This provides a mechanism by which you can get the result (stdout) of an arbitrary command into a make variable, where you can do whatever you like with it. One curious thing about $(shell) is that it doesn’t care at all whether the command you execute succeeds or not, so if you try to read a non-existent file, for example, with $(shell cat missing.txt), GNU make will simply return an empty string as the result of the shell invocation. In some cases you may want to actually check the exit status of that command, and in GNU make 4.2 you now can, by checking the value of .SHELLSTATUS, a new built-in variable that is automatically set to the exit code of the most recent $(shell) (or != assignment). Here’s a contrived example:

1
2
3
4
5
FOO := $(shell exit 1)
ifneq ($(.SHELLSTATUS),0)
$(error shell command failed!)
endif
all: ; @echo done

As you can see, it’s now possible to make your makefile react in whatever manner you deem appropriate when a shell invocation fails. Be advised, however: if you find yourself doing this, it may be an indication that your makefile is poorly written — almost every use of $(shell) is better handled by creating an actual rule to do whatever you were going to do with $(shell).

Read files with $(file)

The $(file) function was added to GNU make in the 4.0 release, in order to enable the creation of files directly from make — quite handy for those cases in which the content you want to write is so long it exceeds command-line length limits on your system. In 4.2 the $(file) function was extended so that you can use it to read files in addition to writing files. For example, SRCS := $(file <sourcelist.txt) would capture the content of the file sourcelist.txt in the variable SRCS, less the final newline in the file, if any (that last bit is for consistency with the $(shell) function).

Improved error reporting

GNU make 4.2 includes a small but very useful improvement in error reporting: previously when make encountered an error while executing a recipe, it would report only the name of the target being built, such as make: *** [all] Error 1. Starting with 4.2, this error message includes the makefile and line number of the specific command that produced the error: make: *** [Makefile:6: all] Error 1. This should make it much easier to debug large, complex builds — especially anything that uses double-colon rules to composite functionality from many fragments in distinct makefiles.

Bug fixes

In addition to the modest enhancements described above, the 4.2 release includes about three dozen other bug fixes. A glance at the resolution dates on those reveals that sometimes months passed with no updates. This makes me wonder why they didn’t cut a release at those points, even if it were just for bug fixes. My guess is that the project is trapped, in a sense: because the interval between releases is so long there’s a sense that each release has to be “perfect”, and because there’s an attempt to ensure each release is “perfect”, the interval between releases must be very long. Contrast this with a more agile approach, which can tolerate imperfect releases because the next release is just around the corner anyway. Combined with an ever expanding automated regression test suite it’s possible to gradually increase the bar for release quality, such that in fact the likelihood of a bad release goes down when compared with a project that has a long release cycle and is dependent on mostly manual testing.

GNU make isn’t going to go away any time soon, but I think the writing is on the wall: if it doesn’t start innovating again, developers will inevitably migrate to other build tools that do.

post

What’s new in GNU make 4.1?

October 2014 saw the release of GNU make 4.1. Although this release doesn’t have any really remarkable new features, the release is notable because it comes just one year after the 4.0 release — that’s the least time between releases in more than a decade. Hopefully, this is the start of a new era of more frequent, smaller releases for this venerable project which is one of the oldest still active projects in the GNU suite. Read on for notes about the new features in GNU make 4.1.

MAKE_TERMOUT and MAKE_TERMERR

Starting with 4.1, GNU make defines two additional variables: MAKE_TERMOUT and MAKE_TERMERR. These are set to non-empty values if make believes stdout/stderr is attached to a terminal (rather than a file). This enables users to solve a problem introduced by the output synchronization feature that was added in GNU make 4.0: when output synchronization is enabled, all child processes in fact write to a temporary file, even though in effect they are writing to the console. In other words, the implementation details of output synchronization may interfere with behaviors in child processes like output colorization which require a terminal for correct operation. If MAKE_TERMOUT or MAKE_TERMERR is set, then the user may explicitly direct such commands to maintain colorized output despite the fact that they appear to be writing to a file.

Enhanced $(file) function

The $(file) function was added in GNU make 4.0 to enable writing to files from a makefile without having to invoke a second process to do so. For example, where previously you had to do something like $(shell echo hello > myfile), now you can instead use $(file > myfile,foo). In theory this is more efficient, since it avoids creating another process, and it enables the user to easily write large blocks of text which would exceed command-line length limitations on some platforms.

In GNU make 4.1, the $(file) function has been enhanced such that the text to be written may be omitted from the function call. This allows $(file) to work as a sort of “poor man’s” replacement for touch, although having reviewed the bug report that resulted in this change, I think this is more an “enhancement of convenience” than a deliberate attempt to evolve the program. Of course I have to give a shout out to my friend Tim Murphy, who filed the bug report that led to this enhancement — nice work, Tim!

Relaxed constraints for mixing explicit and implicit rules

The final feature change in GNU make 4.1 is that make will emit a regular error rather than a fatal error (which terminates the build) when both explicit and pattern targets are specified as outputs of a rule, like this:

1
foo bar%: baz

This is an interesting change mostly for the high level of drama surrounding it. That bit of syntax is clearly illegal — in fact, if the pattern target is listed first rather than the explicit, GNU make has long identified this as invalid syntax, terminating the parse with *** mixed implicit and normal rules. Stop. Unfortunately, due to a defect in older versions of GNU make this construct is not prohibited when the explicit rule is named first.

In 3.82, the GNU make maintainers fixed the defect: whether or not the explicit target is named first, GNU make would identify the invalid syntax and terminate parsing. Everything was fine for about a year, and then? People flipped out. As it turns out, this construct is used by a prominant open source project: the Linux kernel. The offending syntax had been eliminated from the main development branch shortly after the 3.82 release, but third-party developers suddenly found themselves unable to build legacy versions of the kernel with the latest release of GNU make. A bug report was filed and generated 21 reponses, when the average GNU make bug report has only 3. Ultimately, the maintainers relented by reducing the severity to a non-fatal error for the 4.1 release — but with a stern message that this will likely become a fatal error again in a future release.

Bug fixes and thoughts

In addition to the bigger items identified above, the 4.1 release includes about two dozen other bug fixes. Overall, this release feels like a minor one — as often happens when release frequency increases, the individual releases become less interesting. From an agile/continuous delivery standpoint, that’s exactly what you want. But I’ve found that it is also difficult for a team that’s accustomed to less frequent releases with larger payloads to transition to smaller, more frequent releases while still incorporating large changes that take longer than one release to implement. Of course, one point does not make a line — that is, we can’t tell from this release alone whether the intention is to switch to a more frequent release cadence, or whether this release is an exception. If they are trying to increase the frequency, I think it will be very interesting to see how the GNU make development team adapts to the new cadence. Regardless, I’d like to congratulate the team for this release and I look forward to seeing what comes next.

post

HOWTO: Intro to GNU make variables

One thing that many GNU make users struggle with is predicting the value of a variable. And it’s no wonder, with the way make syntax freely mingles text intended for two very distinct phases of execution, and with two “flavors” of variables with very different semantics — except, that is, when the one seems to behave like the other. In this article I’ll run you through the fundamentals of GNU make variables so you can impress your friends (well, your nerdy friends, anyway) with your ability to predict the value of a GNU make variable at social gatherings.

Contents

Basics

Let’s start with the basics: a GNU make variable is simply a shorthand reference for another string of text. Using variables enables you to create more flexible makefiles that are also easier to read and modify. To create a variable, just assign a value to a name:

1
CFLAGS=-g -O2

Later, when GNU make sees a reference to the variable, it will replace the reference with the value of the variable — this is called expanding the variable. A variable reference is just the variable name wrapped in parenthesis or curly braces and prefixed with a dollar-sign. For example, this simple makefile will print “Hello, world!” by first assigning that text to a variable, then dereferencing the variable and using echo to print the variable’s value:

1
2
3
MSG = Hello, world!
all:
@echo $(MSG)

Creating variables

NAME = value is just one of many ways to create a variable. In fact there are at least eight ways to create a variable using GNU make syntax, plus there are built-in variables, command-line declarations, and of course environment variables. Here’s a rundown of the ways to create a GNU make variable:

  • MYVAR = abc creates the variable MYVAR if it does not exist, or changes its value if it does. Either way, after this statement is processed, the value of MYVAR will be abc.
  • MYVAR ?= def will create the variable MYVAR with the value def only if MYVAR does not already exist.
  • MYVAR += ghi will create the variable MYVAR with the value if MYVAR does not already exist, or it will append ghi to MYVAR if it does already exist.
  • MYVAR := jkl creates MYVAR if it does not exist, or changes its value if it does. This variation is just like the first, except that it creates a so-called simple variable, instead of a recursive variable — more on that in a minute.

In addition to the various assignment operators, you can create and modify variables using the define directive — handy if you want to create a variable with a multi-line value. Besides that, the define directive is equivalent to the normal VAR=VALUE assignment.

1
2
3
4
define MYVAR
abc
def
endef

If you’re using GNU make 3.82 or later, you can add assignment operators to the define directive to modify the intent. For example, to append a multi-line value to an existing variable:

1
2
3
4
define MYVAR +=
abc
def
endef

But there are still more ways to create variables in GNU make:

  • Environment variables are automatically created as GNU make variables when GNU is invoked.
  • Command-line definitions enable you to create variables at the time you invoke GNU make, like this: gmake MYVAR=123.
  • Built-in variables are automatically created when GNU make starts. For example, GNU make defines a variable named CC which contains the name of the default C compiler (cc) and another named CXX which contains the name of the default C++ compiler (g++).

Variable flavors

Now that you know how to create a GNU make variable and how to dereference one, consider what happens when you reference a variable while creating a second variable. Let’s use a few simple exercises to set the stage. For each, the answer is hidden on the line following the makefile. You can reveal the answer by highlighting the hidden text in your browser.

  1. Q1: What will this makefile print?
    1
    2
    3
    4
    ABC = Hello!
    MYVAR = $(ABC)
    all:
    @echo $(MYVAR)

    A1: Hello!

  2. Q2: What will this makefile print?
    1
    2
    3
    4
    5
    6
    ABC = Hello!
    MYVAR = $(ABC)
    all:
    @echo $(MYVAR)
    ABC = Goodbye!

    A2: Goodbye!

  3. Q3: What will this makefile print?
    1
    2
    3
    4
    5
    6
    ABC = Hello!
    MYVAR := $(ABC)
    all:
    @echo $(MYVAR)
    ABC = Goodbye!

    A3: Hello!

Don’t feel bad if you were surprised by some of the answers! This is one of the trickiest aspects of GNU make variables. To really understand the results, you have to wrap your brain around two core GNU make concepts. The first is that there are actually two different flavors of variables in GNU make: recursive, and simple. The difference between the two is in how GNU make handles variable references on the right-hand side of the variable assignment — for brevity I’ll call these “subordinate variables”:

  • With simple variables, subordinate variables are expanded immediately when the assignment is processed. References to subordinate variables are replaced with the value of the subordinate variable at the moment of the assignment. Simple variables are created when you use := in the assignment.
  • With recursive variables, expansion of subordinate variables is deferred until the variable named on the left-hand side of the assignment is itself referenced. That leads to some funny behaviors, because the value of the subordinate variables at the time of the assignment is irrelevant — in fact, the subordinate variables may not even exist at that point! What matters is the value of the subordinate variables when the LHS variable is expanded. Recursive variables are the default flavor, and they’re created when you use simply = in the assignment.

The second concept is that GNU make processes a makefile in two separate phases, and each phase processes only part of the text of the makefile. The first phase is parsing, during which GNU make interprets all of the text of the makefile that is outside of rule bodies. During parsing, rule bodies are not interpreted — only extracted for processing during the second phase: execution, or when GNU make actually starts running the commands to update targets. For purposes of this discussion, that means that the text in rule bodies is not expanded until after all the other text in the makefile has been processed, including variable assignments that physically appear after the rule bodies. In the following makefile, the text highlighted in green is processed during parsing; the text highlighted in blue is processed later, during execution. Again, to put a fine point on it: all of the green text is processed before any of the blue text:

1
2
3
4
5
6
ABC = Hello!
MYVAR = $(ABC)
all:
@echo $(MYVAR)
ABC = Goodbye!

Now the examples above should make sense. In Question 2, we created MYVAR as a recursive variable, which means the value of ABC at the time MYVAR is created doesn’t matter. By the time GNU make needs to expand MYVAR, the value of ABC has changed, so that’s what we see in the output.

In Question 3, we created MYVAR as a simple variable, so the value of ABC was captured immediately. Even though the value of ABC changes later, that change doesn’t affect the value of MYVAR.

Target-specific variables

Most variables in GNU make are global: that is, they are shared across all targets in the makefile and expanded the same way for all targets, subject to the rules outlined above. But GNU make also supports target-specific variables: variables given distinct values that are only used when expanding the recipe for a specific target (or its prerequisites).

Syntactically, target-specific variables look like a mashup of normal variable definitions, using =, :=, etc.; and prerequisite declarations. For example, foo: ABC = 123 creates a target-specific definition of ABC for the target foo. Even if ABC has already been defined as a global variable with a different value, this target-specific definition will take precedence when expanding the recipe for foo. Consider this simple makefile:

1
2
3
4
5
6
7
8
ABC = Hello!
all: foo bar
foo:
@echo $(ABC)
bar: ABC = Goodbye!
bar:
@echo $(ABC)

At first glance you might expect this makefile to print “Goodbye!” twice — after all, ABC is redefined with the value “Goodbye!” before the commands for foo are expanded. But because the redefinition is target-specific, it only applies to bar. Thus, this makefile will print one “Hello!” and one “Goodbye!”.

As noted, target-specific variables are inherited from a target to its prereqs — for example, the following makefile will print “Surprise!”, because bar inherits the target-specific value for ABC from foo:

1
2
3
4
5
6
ABC = Normal.
foo: ABC = Surprise!
foo: bar
bar:
@echo $(ABC)

You can do some neat tricks with this, but I urge you not to rely on the behavior, because it doesn’t always work the way you might think. In particular, if a target is listed as a prereq for multiple other targets, each of which have a different target-specific value for some variable, the actual value used for the prereq may vary depending on which files were out-of-date and the execution order of the targets. As a quick example, compare the output of the previous makefile when invoked with gmake foo and when invoked with gmake bar. In the latter case, the target-specific value from foo is never applied, because foo itself was not processed. With GNU make 3.82 or later, you can prevent this inheritence by using the private modifier, as in foo: private ABC = Surprise!.

Finally, note that target-specific variables may be applied to patterns. For example, a line reading %.o: ABC=123 creates a target-specific variable for all targets matching the pattern %.o.

Conclusion

If you’ve made it this far, you now know just about everything there is to know about GNU make variables. Congratulations! I hope this information will serve you well.

Questions or comments? Use the form below or hit me up on Twitter @emelski.

post

The ElectricAccelerator 7.2 “Ship It!” Award

Naturally with the release of ElectricAccelerator 7.2 a few weeks ago it’s time for another Accelerator “Ship It!” award. In keeping with our tradition, I gave each team member a LEGO figure that symbolized the release to me in some way, along with a a custom trading card giving the vital details: version, release date, and key features. Like a baseball card, the back is filled with a team roster and release statistics.

There are some great improvements in Accelerator 7.2 but there’s no particular unifying theme, so it was quite a challenge to choose a suitable minifig. One thing that stood out is that between the time management asked engineering to create a 7.2 release and the time that development was complete was only about three weeks. At the time we were actually in the midst of development on another release entirely, with a different set of new features. The 7.2 release was very much a, “Hey couldn’t you also cut a release right now while you’re at it?” And we did. Maybe it’s not as impressive as those guys that can cut a release every minute of every day, but for a team that usually does releases on a six-month cadence, a 3-week turnaround sounds like continuous delivery to me.

One thing enabled us to turn around the release that quickly: our code is (nearly) always shippable. That’s what led me to the minifig for this release: the sea captain, who’s always ready to “ship out” on short notice. Here’s the trading card that accompanied the figure:

Accelerator 7.2 "Ship It!" Card Front - click for larger version

Accelerator 7.2 “Ship It!” Card Front – click for full-size version

Accelerator 7.2 "Ship It!" Card Back - click for larger version

Accelerator 7.2 “Ship It!” Card Back – click for full-size version

Like the 7.1 card, the back of the 7.2 card incorporates stats for the current release, contextualized by stats for several previous releases:

  • Number of days in development. This is just the number of days since the previous feature release — it is assumed that whatever features are in the new release, we started working on them more-or-less after the last release went out.
  • JIRA issues closed.
  • Total KLOC. This metric gives the total size of the Accelerator code base in thousands of lines of code, as measured with the excellent Count Lines of Code utility by Al Danial. This measurement excludes comments and whitespace.
  • Change in KLOC. This is simply the arithmetic difference between the total KLOC for each release and its predecessor.

As always, my sincere gratitude goes to everybody on the Accelerator team, without whom this release would not have been. Thank you!

post

What’s new in ElectricAccelerator 7.2?

Wow, time flies! Another six months has come and gone, which means it’s time for another ElectricAccelerator feature release. Right on cue, ElectricAccelerator 7.2 dropped a couple weeks back on April 17, 2014. There’s no unifying theme to this release — actually we’re in the middle of a much more ambitious project that I can’t say much about quite yet, but over the last several months we’ve made a number of improvements to Accelerator core functionality, and we’re eager to get those out users. Thus we have the 7.2 release, with the following marquee features: dramatic Linux performance improvements for certain use cases, a key enhancement to our parse avoidance feature to improve accuracy, and expanded Linux platform support. Read on for the details.

Linux performance improvements

Accelerator 7.2 incorporates two performance improvements for Linux-based builds. The first is a redesign of the integration between the Electric File System (EFS) and the Linux kernel, which reduces lock contention in the EFS. Consequently, any build job that makes concurrent accesses to the filesystem should see some performance improvement. In one example, a build that executed two tar processes simultaneously in one job saw runtime drop from 11 minutes with Accelerator 7.1 to just 6 minutes with Accelerator 7.2, nearly 2x faster!

The second improvement is full support for the Linux d_type extension to the readdir() system call. On most Unix and Unix-like systems, the readdir() system call only gives the application programmer a couple pieces of information: the names of the files in a directory, and the inode number for those files. On Linux, filesystems may also include file type information in the results, which enables programs to operate more efficiently in some cases as they can avoid the overhead of an addition stat() call to get the file type. On a local filesystem that optimization is interesting but not necessarily game-changing; but with a distributed network filesystem like the EFS, that optimization can result in enormous improvements. In our benchmarks we saw jobs using find to scan large directory structures execute nearly 9x faster with Accelerator 7.2 versus Accelerator 7.1

Parse avoidance update

For large builds with many or complicated makefiles, Accelerator’s parse avoidance feature is a game-changer by dramatically reducing the time necessary to read and interpret makefiles at the start of a build. On the Android KitKat open-source build, parse avoidance reduces a 4 minute parse job to about 5 seconds — nearly 50x faster!. Since its introduction in Accelerator 7.0, parse avoidance has delivered jaw-dropping improvements like that in a wide variety of builds.

But use of this feature has been problematic in one specific use case: makefiles that use wildcards in prerequisite lists, with either $(wildcard) or $(shell). In certain circumstances this makefile anti-pattern could cause emake to produce “false positives” from the parse avoidance cache, such that emake would incorrectly use a previously cached parse result when it should have instead reparsed the makefile. In Accelerator 7.2 we’ve extended the #pragma cache syntax so that you can inform emake of the wildcard patterns to consider when determining cache suitability. This will enable even more users to enjoy the benefits of parse avoidance, without sacrificing reliability or performance. Usage instructions can be found in the Electric Make User’s Guide.

New platform support

Finally, with Accelerator 7.2 we’ve further expanded our already sweeping platform support to include RedHat Enterprise Linux 6.5, Ubuntu 13.04 and Windows Server 2012. This may seem like a modest increment, but I’m particularly excited about this update not for the what but for the who: you see, this is the first time that somebody other than myself made all the updates needed to support a new version of Linux, start to finish. With another set of hands to do that work we should be able to add support for new Linux platforms much more quickly in the future, which is welcome news indeed (thanks, Tim)!

What’s next?

Years ago, I thought that we would eventually get to the point that Accelerator was “done” and we’d have nothing left to do. How young and foolish I was! In reality, it seems that the TODO list only gets longer and longer. We’re still working hard on the “buddy cluster” concept, as well as Bitbake and ninja integration. And of course, we’re always working to improve performance — more on that in a future post.

ElectricAccelerator 7.2 is available immediately for existing customers. Contact support@electric-cloud.com to get the bits. New users can contact sales@electric-cloud.com for a evaluation.

post

The Twelve Days of Christmas, GNU make style

Well, it’s Christmas Day in the States today, and while we’re all recovering from the gift-opening festivities, I thought this would be the perfect time for a bit of fun with GNU make. And what better subject matter than the classic Christmas carol “The Twelve Days of Christmas”? Its repetitive structure is perfect for demonstrating how to use several of GNU make’s built-in functions for iteration, selection and sorting. This simple makefile prints the complete lyrics to the song:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
L01=Twelve drummers drumming,
L02=Eleven pipers piping,
L03=Ten lords-a-leaping,
L04=Nine ladies dancing,
L05=Eight maids-a-milking,
L06=Seven swans-a-swimming,
L07=Six geese-a-laying,
L08=Five golden rings,
L09=Four calling birds,
L10=Three french hens,
L11=Two turtle doves, and
L12=A partridge in a pear tree!
LINES=12 11 10 09 08 07 06 05 04 03 02 01
DAYS=twelfth eleventh tenth ninth \
eighth seventh sixth fifth \
fourth third second first
$(foreach n,$(LINES),\
$(if $(X),$(info ),$(eval X=X))\
$(info On the $(word $n,$(DAYS)) day of Christmas,)\
$(info my true love gave to me)\
$(foreach line,$(wordlist $n,12,$(sort $(LINES))),\
$(info $(L$(line)))))
all: ; @:

By count, most of the lines here just declare variables, one for each item mentioned in the song. Note how the items are ordered: the last item added is given the lowest index. That means that to construct each verse we simply enumerate every item in the list, in order, starting with the new item in each verse.

Line 18 is where the real meat of the makefile begins. Here we use GNU make’s foreach function to iterate through the verses. $(foreach) takes three arguments: a name for the iteration variable, a space-separated list of words to assign to the iteration variable in turn, and a body of text to expand repeatedly, once for each word in the list. Here, the list of words is given by LINES, which lists the starting line for each verse, in order — that is, the first verse starts from line 12, the second from line 11, etc. The text to expand on each iteration is all the text on lines 19-23 of the makefile — note the use of backslashes to continue each line to the next.

Line 19 uses several functions to print a blank line before starting the next verse, if we’ve printed a verse already: the $(if) function, which expands its second argument if its first argument is non-empty, and its third argument if its first argument is empty; the $(info) function to print a blank line; and the $(eval) function to set the flag variable. The first time this line is expanded, X does not exist, so it expands to an empty string and the $(if) picks the “else” branch. After that, X has a value, so the $(if) picks the “then” branch.

Lines 20 and 21 again use $(info) to print output — this time the prelude for the verse, like “On the first day of Christmas, my true love gave to me”. The ordinal for each day is pulled from DAYS using the $(word) function, which extracts a specified word, given by its first argument, from the space-separated list given as its second argument. Here we’re using n, the iteration variable from our initial $(foreach) as the selector for $(word).

Line 22 uses $(foreach) again, this time to iterate through the lines in the current verse. We use line as the iteration variable. The list of words is given again by LINES except now we’re using $(sort) to reverse the order, and $(wordlist) to select a subset of the lines. $(wordlist) takes three arguments: the index of the first word in the list to select, the index of the last word to select, and a space-separated list of words to select from. The indices are one-based, not zero-based, and $(wordlist) returns all the words in the given range. The body of this $(foreach) is just line 23, which uses $(info) once more to print the current line of the current verse.

Line 25 has the last bit of funny business in this makefile. We have to include a make rule in the makefile, or GNU make will complain *** No targets. Stop. after printing the lyrics. If we simply declare a rule with no commands, like all:, GNU make will complain Nothing to be done for `all’.. Therefore, we define a rule with a single “no-op” command that uses the bash built-in “:” to do nothing, combined with GNU make’s @ prefix to suppress printing the command itself.

And that’s it! Now you’ve got some experience with several of the built-in functions in GNU make — not bad for a Christmas day lark:

  • $(eval) for dynamic interpretation of text as makefile content
  • $(foreach), for iteration
  • $(if), for conditional expansion
  • $(info), for printing output
  • $(sort), for sorting a list
  • $(word), for selecting a single word from a list
  • $(wordlist), for selecting a range of words from a list

Now — where’s that figgy pudding? Merry Christmas!

post

UPDATE: SCons is Still Really Slow

A while back I posted a series of articles exploring the scalability of SCons, a popular Python-based build tool. In a nutshell, my experiments showed that SCons exhibits roughly quadratic growth in build runtimes as the number of targets increases:

Recently Dirk Baechle attempted to rebut my findings in an entry on the SCons wiki: Why SCons is not slow. I thought Dirk made some credible suggestions that could explain my results, and he did some smart things in his effort to invalidate my results. Unfortunately, his methods were flawed and his conclusions are invalid. My original results still stand: SCons really is slow. In the sections that follow I’ll share my own updated benchmarks and show where Dirk’s analysis went wrong.

Test setup

As before, I used genscons.pl to generate sample builds ranging from 2,000 to 50,000 targets. However, my test system was much beefier this time:

2013 2010
OS Linux Mint 14 (kernel version 3.5.0-17-generic) RedHat Desktop 3 (kernel version 2.4.21-58.ELsmp)
CPU Quad 1.7GHz Intel Core i7, hyperthreaded Dual 2.4GHz Intel Xeon, hyperthreaded
RAM 16 GB 2 GB
HD SSD (unknown)
SCons 2.3.0 1.2.0.r3842
Python 2.7.3 (system default) 2.6.2

Before running the tests, I rebooted the system to ensure there were no rogue processes consuming memory or CPU. I also forced the CPU cores into “performance” mode to ensure that they ran at their full 1.7GHz speed, rather than at the lower 933MHz they switch to when idle.

Revisiting the original benchmark

I think Dirk had two credible theories to explain the results I obtained in my original tests. First, Dirk wondered if those results may have been the result of virtual memory swapping — my original test system had relatively little RAM, and SCons itself uses a lot of memory. It’s plausible that physical memory was exhausted, forcing the OS to swap memory to disk. As Dirk said, “this would explain the increase of build times” — you bet it would! I don’t remember seeing any indication of memory swapping when I ran these tests originally, but to be honest it was nearly 4 years ago and perhaps my memory is not reliable. To eliminate this possibility, I ran the tests on a system with 16 GB RAM this time. During the tests I ran vmstat 5, which collects memory and swap usage information at five second intervals, and captured the result in a log.

Next, he suggested that I skewed the results by directing SCons to inherit the ambient environment, rather than using SCons’ default “sanitized” environment. That is, he felt I should have used env = Environment() rather than env = Environment(ENV = os.environ). To ensure that this was not a factor, I modified the tests so that they did not inherit the environment. At the same time, I substituted echo for the compiler and other commands, in order to make the tests faster. Besides, I’m not interested in benchmarking the compiler — just SCons! Here’s what my Environment declaration looks like now:

env = Environment(CC = 'echo', AR = 'echo', RANLIB = 'echo')

With these changes in place I reran my benchmarks. As expected, there was no change in the outcome. There is no doubt: SCons does not scale linearly. Instead the growth is polynomial, following an n1.85 curve. And thanks to the the vmstat output we can be certain that there was absolutely no swapping affecting the benchmarks. Here’s a graph of the results, including an n1.85 curve for comparison — notice that you can barely see that curve because it matches the observed data so well!

SCons full build runtime - click for larger view

For comparison, I used the SCons build log to make a shell script that executes the same series of echo commands. At 50,000 targets, the shell script ran in 1.097s. You read that right: 1.097s. Granted, the shell script doesn’t do stuff like up-to-date checks, etc., but still — of the 3,759s average SCons runtime, 3,758s — 99.97% — is SCons overhead.

I also created a non-recursive Makefile that “builds” the same targets with the same echo commands. This is a more realistic comparison to SCons — after all, nobody would dream of actually controlling a build with a straight-line shell script, but lots of people would use GNU make to do it. With 50,000 targets, GNU make ran for 82.469s — more than 45 times faster than SCons.

What is linear scaling?

If the performance problems are so obvious, why did Dirk fail to see them? Here’s a graph made from his test results:

SCons full build runtime, via D. Baechle - click for full size

Dirk says that this demonstrates “SCons’ linear scaling”. I find this statement baffling, because his data clearly shows that SCons does not scale linearly. It’s simple, really: linear scaling just means that the build time increases by the same amount for each new target you add, regardless of how many targets you already have. Put another way, it means that the difference in build time between 1,000 targets and 2,000 targets is exactly the same as the difference between 10,000 and 11,000 targets, or between 30,000 and 31,000 targets. Or, put yet another way, it means that when you plot the build time versus the number of targets, you should get a straight line with no change in slope at any point. Now you tell me: does that describe Dirk’s graph?

Here’s another version of that graph, this time augmented with a couple additional lines that show what the plot would look like if SCons were truly scaling linearly. The first projection is based on the original graph from 2,500 to 4,500 targets — that is, if we assume that SCons scales linearly and that the increase in build time between 2,500 and 4,500 targets is representative of the cost to add 2,000 more targets, then this line shows us how we should expect the build time to increase. Similarly, the second projection is based on the original graph between 4,500 and 8,500 targets. You can easily see that the actual data does not match either projection. Furthermore you can see that the slope of these projections is increasing:

SCons full build runtime with linear projections, via D. Baechle - click for full size

This shows the importance of testing at large scale when you’re trying to characterize the scalability of a system from empirical data. It can be difficult to differentiate polynomial from logarithmic or linear at low scales, especially once you incorporate the constant factors — polynomial algorithms can sometimes even give better absolute performance for small inputs than linear algorithms! It’s not until you plot enough data points at large enough values, as I’ve done, that it becomes easy to see and identify the curve.

What does profiling tell us?

Next, Dirk reran some of his tests under a profiler, on the very reasonable assumption that if there was a performance problem to be found, it would manifest in the profiling data — surely at least one function would demonstrate a larger-than-expected growth in runtime. Dirk only shared profiling data for two runs, both incremental builds, at 8,500 and 16,500 targets. That’s unfortunate for a couple reasons. First, the performance problem is less apparent on incremental builds than on full builds. Second, with only two datapoints it is literally not possible to determine whether growth is linear or polynomial. The results of Dirk’s profiling was negative: he found no “significant difference or increase” in any function.

Fortunately it’s easy to run this experiment myself. Dirk used cProfile, which is built-in to Python. To profile a Python script you can inject cProfile from the command-line, like this: python -m cProfile scons. Just before Python exits, cProfile dumps timing data for every function invoked during the run. I ran several full builds with the profiler enabled, from 2,000 to 20,000 targets. Then I sorted the profiling data by function internal time (time spent in the function exclusively, not in its descendents). In every run, the same two functions appeared at the top of the list: posix.waitpid and posix.fork. To be honest this was a surprise to me — previously I believed the problem was in SCons’ Taskmaster implementation. But I can’t really argue with the data. It makes sense that SCons would spend most of its time running and waiting for child processes to execute, and even that the amount of time spent in these functions would increase as the number of child processes increases. But look at the growth in runtimes in these two functions:

SCons full build function time, top two functions - click for full size

Like the overall build time, these curves are obviously non-linear. Armed with this knowledge, I went back to Dirk’s profiling data. To my surprise, posix.waitpid and posix.fork don’t even appear in Dirk’s data. On closer inspection, his data seems to include only a subset of all functions — about 600 functions, whereas my profiling data contains more than 1,500. I cannot explain this — perhaps Dirk filtered the results to exclude functions that are part of the Python library, assuming that the problem must be in SCons’ own code rather than in the library on which it is built.

This demonstrates a second fundamental principle of performance analysis: make sure that you consider all the data. Programmers’ intuition about performance problems is notoriously bad — even mine! — which is why it’s important to measure before acting. But measuring won’t help if you’re missing critical data or if you discard part of the data before doing any analysis.

Conclusions

On the surface, performance analysis seems like it should be simple: start a timer, run some code, stop the timer. Done correctly, performance analysis can illuminate the dark corners of your application’s performance. Done incorrectly — and there are many ways to do it incorrectly — it can lead you on a wild goose chase and cause you to squander resources fixing the wrong problems.

Dirk Baechle had good intentions when he set out to analyze SCons performance, but he made some mistakes in his process that led him to an erroneous conclusion. First, he didn’t run enough large-scale tests to really see the performance problem. Second, he filtered his experimental data in a way that obscured the existence of the problem. But perhaps his worst mistake was to start with a conclusion — that there is no performance problem — and then look for data to support it, rather than starting with the data and letting it impartially guide him to an evidence-based conclusion.

To me the evidence seems indisputable: SCons exhibits roughly quadratic growth in runtimes as the number of build targets increases, rendering it unusable for large-scale software development (tens of thousands of build outputs). There is no evidence that this is a result of virtual memory swapping. Profiling suggests a possible pair of culprits in posix.waitpid and posix.fork. I leave it to Dirk and the SCons team to investigate further; in the meantime, you can find my test harness and test results in my GitHub repo. If you can see a flaw in my methodology, sound off in the comments!

%d bloggers like this: