post

What’s new in ElectricAccelerator 6.1

Electric Cloud announced the release of ElectricAccelerator 6.1 a few weeks ago, the 24th feature release of ElectricAccelerator since the company was founded in 2002. This release incorporates several enhancements that make the product more robust and more flexible. Here are the key additions:

Workload reporting

In many large organizations, a single massive Accelerator cluster is shared across multiple development teams, in order to reduce administration overhead and improve hardware utilization — naturally each team has “their” agents, but if those are not in use, they can be easily made available to other teams. Often, the maintenance cost is shared by the teams using the cluster, according to the amount they use the cluster. In order to facilitate this use case, Accelerator 6.1 adds workload reporting. At the conclusion of each build, emake reports the total CPU usage for the build to the cluster manager. The administrator can see a summary of usage by running the “Build Users” report from the cluster manager web interface:

Example build users report; click for full size

Example build users report. Click to view full size.

Multi-interface network support

In data centers it’s not uncommon to find servers configured with two network interfaces: for example, a gigabit connection for communication with systems outside the data center, and a 10 GigE, fiber-optic or Infiniband connection for extremely high-bandwidth communication with other systems inside the data center. Accelerator 6.1 adds explicit support for this configuration, so data transfers between agents in the cluster utilize the high-bandwidth secondary interface for increased performance.

Strong checksums for data integrity

For years Accelerator has relied on the checksum built into the TCP standard to ensure integrity of network data transfers. Unfortunately that checksum is relatively weak, and in rare cases we found it was insufficient to detect data errors introduced by faulty network hardware. In this release, we added an application-layer checksum to further guard against such problems. Accelerator 6.1 uses CRC32-c, chosen for its robust error detection capabilities and high performance.

Expanded platform support

Accelerator 6.1 adds support for a few new platforms and third-party tools, including:

  • Ubuntu 11.04
  • ClearCase 8

What’s next?

It feels great to have wrapped up Accelerator 6.1, but we’re not resting on our laurels. We’ve already done a lot of work for 6.2, which includes some key robustness improvements that have been literally years in the making; and we’re getting started on 7.0, which currently is planned to include some really exciting new performance enhancements around incremental builds — more on that soon.

Accelerator 6.1 is available immediately for current customers. New customers should contact sales@electric-cloud.com.

post

Measuring the Electric File System

Somebody asked me the other day what portion of the Electric File System (EFS) is shared code versus platform-specific code. The EFS, if you don’t know, is a custom filesystem driver and part of ElectricAccelerator. It enables us to virtualize the filesystem, so that each build job sees the correct view of the filesystem according to its logical position in the build, even if jobs are run out of order. It also provides the file usage information that powers our conflict detection algorithm. As a filesystem driver, the EFS is obviously tightly coupled to the platforms it’s used on, and the effort of porting it is one reason we don’t support more platforms than we do now — not that a dozen variants of Windows, 16 flavors of Linux and several versions of Solaris is anything to be ashamed of!

Anyway, the question intrigued me, and I found the answer quite surprising (click for full-size):

Note that here I’m measuring only actual code lines, exclusive of comments and whitespace, as of the upcoming 6.1 release. In total, the Windows version of the EFS has about 2x the lines of code that either the Solaris or Linux ports have. The platform-specific portion of the code is more than 6x greater!

Why is the Windows port so much bigger? To answer that question, I started looking at the historical size of the three ports, which lead to the following graph showing the total lines of code for (almost) every release we’ve made. Unfortunately our first two releases, 1.0 and 2.0, have been lost to the ether at some point over the past ten years, but I was able to collect data for every release starting from 2.1 (click for full-size):

Here you can see that the Windows port has always been larger than the others, but it’s really just a handful of Windows-specific features that blew up the footprint so much. The first of those was support for the Windows FastIO filesystem interface, an alternative “fast path” through the kernel that in theory enables higher throughput to and from the filesystem. It took us two tries to get that feature implemented, as shown on the graph, and all-told that contributed about 7,000 lines of code. The addition of FastIO to the filesystem means that the Windows driver has essentially three I/O interfaces: standard I/O, memory-mapped I/O and FastIO. In comparison, on Linux and Solaris you have only two: standard I/O and memory-mapped I/O.

The second significant difference between the platforms is that on Windows the EFS has to virtualize the registry in addition to the filesystem. In the 4.3 release we significantly enhanced that portion of the driver to allow full versioning of the registry along the same lines that the filesystem has always done. That feature added about 1,000 lines of code.

I marked a couple other points of interest on this graph as well. First, the addition of the “lightweight EFS” feature, which is when we switched from using RAM to store the contents of all files in the EFS to using temporary storage on the local filesystem for large files. That enabled Accelerator to handle significantly larger files, but added a fair amount of code. Finally, in the most recent release you can actually see a small decrease in the total lines of code on Solaris and Linux, which reflects the long-overdue removal of code that was needed to support legacy versions of those platforms (such as the 2.4.x Linux kernel).

I was surprised as well by the low rate of growth in the Solaris and Linux ports. Originally the Linux port supported only one version of the Linux kernel, but today it supports about sixteen. I guess this result reveals that the difficulty in porting to each new Linux version is not so much the amount of code to be added, but in figuring out exactly which few lines to add!

In fact, after the 4.4 release in early 2009, the growth has been relatively slow on all platforms, despite the addition of major new features to Accelerator as a whole over the last several releases. The reason is simply that most feature development involves changes to other components (primarily emake), rather than to the filesystem driver.

One last metric to look at is the number of unit tests for the code in the EFS. We don’t practice test-driven development for the most part, but we do place a strong emphasis on unit testing. Developers are expected to write unit tests for every change, and we strive to keep the tests isomorphic to the code to help ensure we have good coverage. Here’s how the total number of unit tests has grown over the years (click for full-size):

Note that this is looking only at unit tests for the EFS! We have thousands more unit tests for the other components of Accelerator, as well as thousands of integration tests.

Thankfully for my credibility, the growth in number of tests roughly matches the growth in lines of code! But I’m surprised that the ratio of tests to code is not more consistent across the platforms — that suggests a possible area for improvement. Rather than being discouraged by that though, I’m pleased with this result. After all, what’s the point of measuring something if you don’t use that data to improve?

post

Another confusing conflict in ElectricAccelerator

After solving the case of the confounding conflict, my user came back with another scenario where ElectricAccelerator produced an unexpected (to him) conflict:

1
2
3
4
5
6
all:
@$(MAKE) foo
@cp foo bar
foo:
@sleep 2 && echo hello world > foo

If you run this build without a history file, using at least two agents, you will see a conflict on the continuation job that executes the cp foo bar command, because that job is allowed to run before the job that creates foo in the recursive make invocation. After one run of course, emake records the dependency in history, so later builds don’t make the same mistake.

This situation is a bit different from the symlink conflict I showed you previously. In that case, it was not obvious what caused the usage that triggered the conflict (the GNU make stat cache). In this case, it’s readily apparent: the continuation job reads (or attempts to read) foo before foo has been created. That’s pretty much a text-book example of the sort of thing that causes conflicts.

What’s surprising in this example is that the continuation job is not automatically serialized with the recursive make that precedes it. In a very real sense, a continuation job is an artificial construct that we created for bookkeeping reasons internal to the implementation of emake. Logically we know that the commands in the continuation job should follow the commands in the recursive make. In fact it would be absolutely trivial for emake to just go ahead and stick in a dependency to ensure that the continuation is not allowed to start until after the recursive make finishes, thereby avoiding this conflict even when you have no history file.

Given a choice between two strategies that both produce correct output, emake uses the strategy that produces the best performance in the general case.

Absolutely trivial to do, yes — but also absolutely wrong. Not for correctness reasons, this time, but for performance. Remember, emake is all about maximizing performance across a broad range of builds. Given a choice between two strategies that both produce correct output, emake uses the strategy that produces the best performance in the general case. For continuation jobs, that means not automatically serializing the continuation against the preceding recursive make. I could give you a wordy, theoretical explanation, but it’s easier to just show you. Suppose that your makefile looked like this instead of the original — the difference here is that the continuation job itself launches another recursive make, rather than just doing a simple cp:

1
2
3
4
5
6
7
8
9
all:
@$(MAKE) foo
@$(MAKE) bar
foo:
@sleep 2 && echo hello world > foo
bar:
@sleep 2 && echo goodbye > bar

Hopefully you agree that the ideal execution of this build would have both foo and bar running in parallel. Forcing the continuation job to be serialized with the preceding recursive make would choke the performance of this build. And just in case you’re thinking that emake could be really clever by looking at the commands to be executed in the continuation job, and only serializing “when it needs to”: it can’t. First, that would require emake to implement an entire shell syntax parser (or several, really, since you can override SHELL in your makefile). Second, even if emake had that ability, it would be thwarted the instant the command is something like my_custom_script.pl — there’s no way to tell what will happen when that gets invoked. It could be a simple filesystem access. It could be a recursive make. It could be a whole series of recursive makes. Even when the command is something you think you recognize, can emake really be sure? Maybe cp is not our trustworthy standard Unix cp, but something else entirely.

Again, all is not lost for this user. If you want to avoid this conflict, you have a couple options:

  1. Use a good history file from a previous build. This is the simplest solution. You’ll only get conflicts in this build if you run without a history file.
  2. Refactor the makefile. You can explicitly describe the dependency between the commands in the continuation job and the recursive make by refactoring the makefile so that the stuff in the continuation is instead its own target, thus taking the decision out of emake’s hands. Here’s one way to do that:
    1
    2
    3
    4
    5
    6
    7
    8
    all: do_foo
    @cp foo bar
    do_foo:
    @$(MAKE) foo
    foo:
    @sleep 2 && echo hello world > foo

Either of these will eliminate the conflict from your build.

post

ElectricAccelerator and the Case of the Confounding Conflict

A user recently asked me why ElectricAccelerator reports a conflict in this simple build, when executed without a history file from a previous run:

1
2
3
4
5
6
7
all: foo symlink_to_foo
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo

Specifically, if you have at least two agents, emake will report a conflict between symlink_to_foo and foo, indicating that symlink_to_foo somehow read or otherwise accessed foo during execution! But ln does not access the target of a symlink when creating the symlink — in fact, you can even create a symlink to a non-existent file if you like. It seems obvious that there should be no conflict. What’s going on?

To understand why this conflict occurs, you have to wrap your head around two things. First, there’s more going on during a gmake-driven build than just the commands you see gmake invoke. That causes the usage that provokes the conflict. Second, emake considers a serial gmake build the “gold standard” — if a serial gmake build produces a particular result, so too must emake. That’s why the additional usage must result in a conflict.

In this case, the usage that triggers the conflict comes from management of the gmake stat cache. This is a gmake feature that was added to improve performance by avoiding redundant calls to stat() — once you’ve stat()‘d a file once, you don’t need to do it again. Unless the file is changed of course, which happens quite a lot during a build. To keep the stat cache up-to-date as the build progresses, gmake re-stat()‘s each target after it finishes running the commands for the target. So after the commands for symlink_to_foo complete, gmake stat()‘s symlink_to_foo again, using the standard stat() system call, which follows the symlink (in contrast to lstat(), which does not follow the symlink). That means gmake will actually cache the attributes of foo for symlink_to_foo.

To ensure compatibility with gmake, emake has to do the same. In Accelerator parlance, that means we get read usage on symlink_to_foo (because you have to read the symlink itself to determine the target of the symlink), and lookup usage on foo. The lookup on foo causes the conflict, because, of course, you will get a different result if you lookup foo before the job that creates it than you would get if you do the lookup after that job. Before the job, you’ll find that foo does not exist, obviously; after, you’ll find that it does.

But what difference does that make, really? In truth, if there’s no detectable difference in behavior, then it doesn’t matter at all. And in the example build there is no detectable difference — the build output is the same regardless of when exactly you stat() symlink_to_foo relative to when foo is created. But with a small modification to the build, it is suddenly becomes possible to see the impact:

1
2
3
4
5
6
7
8
9
10
all: foo symlink_to_foo reader
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo
reader: foo symlink_to_foo
@echo newer prereqs are: $?

Compare the output when this build is run serially with the output when the build is run in parallel — and note that I’m using gmake, so you can be certain I’m not trying to trick you with some peculiarity of emake’s implementation:

You can plainly see the difference: in the parallel build gmake stat()‘s symlink_to_foo before foo exists, so the stat cache records symlink_to_foo as non-existent. Then when gmake generates the value of $? for reader, symlink_to_foo is excluded, because non-existent files are never considered newer than existing files. In the serial build, gmake stat()‘s symlink_to_foo after foo has been created, so the stat cache indicates that symlink_to_foo exists and is newer than reader, so it is included in $?.

Hopefully you see now both what causes the conflict, and why it is necessary. The conflict occurs because of lookup usage generated when updating the stat cache. The conflict is necessary to ensure that the build output matches that produced by a serial gmake — the “gold standard” for build correctness. If no conflict is declared, there is the possibility for a detectable difference in build output compared to serial gmake.

However, you might be thinking that although it makes sense to treat this as a conflict in the general case, isn’t it possible to do something smarter in this specific case? After all, the orignal example build does not use $?, and without that there isn’t any detectable difference in the build output. So why not skip the conflict?

The answer is simple, if a bit disappointing. In theory it may be possible to elide the conflict by checking to see if the symlink is used by a later job in a manner that would produce a detectable difference (for example, by scanning the commands for subsequent targets for references to $?), but in reality the logistics of that check are daunting, and I’m not confident that we could guarantee correct behavior in all cases.

Fortunately all is not lost. If you wish to avoid this conflict, you have several options:

  1. Use a good history file from a previous build. This is the most obvious solution. You’ll only get conflicts if you run without a history file.
  2. Add an explicit dependency. If you make foo an explicit prereq of symlink_to_foo, then you will avoid the conflict. Here’s how that would look:
    1
    symlink_to_foo: foo
  3. Change the serial order. If you reorder the makefile so that symlink_to_foo has an earlier serial order than foo you will avoid the conflict. That just requires a reordering of the prereqs of all:
    1
    all: symlink_to_foo foo

Any one of these will eliminate the conflict from your build, and you’ll enjoy fast and correct parallel builds.

Case closed.

post

ElectricMake debug log levels

Often when analyzing builds executed with ElectricMake, all the information you need is in the annotation file — an easily digested XML file containing data such as the relationships between the jobs, the commands run, and the timing of each job. But sometimes you need more detail, and that’s where the emake debug log comes in.

To enable emake debug logging, you specify a pair of command-line arguments: ––emake-debug=value, which specifies the types of debug logging to enable as a set of single-letter values, such as “jng”; and ––emake-logfile=path, which specifies the location of the debug log. In this article I’ll explain each of emake’s debug log levels. Use this index to jump to the definition of a specific log level:

DISCLAIMER: emake debug logs are intended for use by Electric Cloud engineering and support staff. Debug logging contents and availability are subject to change in any release, for any or no reason. Enter at your own risk, your mileage may vary, etc. etc. The information in this article refers to ElectricAccelerator 6.0.

a: agent allocation

Agent allocation logging provides detailed information about emake’s attempts to procure agents from the cluster manager during the build. If you think emake may be stalled trying to acquire agents, allocation logging will help to understand what’s happening.

c: cache

Cache logging records details about the filesystem cache used by emake to accelerate parse jobs in cluster builds. For example, it logs when a directory’s contents are added to the cache, and the result of lookups in the cache. Since it is only used during remote parse jobs, you’ll have to use it with the ––emake-rdebug=value option. Use cache logging if you suspect a problem with the cached local filesystem.

e: environment

Environment logging augments node logging with a dump of the entire environment block for every job as it is sent to an agent. Normally this is omitted because it’s quite verbose (could be as much as 32KB per job). Usually you’re better off using env-level annotation, which is more compact and easier to parse.

f: filesystem

Filesystem logging records numerous details about emake’s interaction with its versioned filesystem data structure. In particular, it logs every time that emake looks up a file (when doing up-to-date checks, for example), and it logs every update to the versioned file system caused by file usage during the build’s execution. This level of logging is very verbose, so you shouldn’t enable it as a general rule. It’s most often used when diagnosing issues related to the versioned filesystem and conflicts.

g: profiling

Profiling logging is one of the easiest to interpret and most useful types of debug logging. When enabled, emake will emit hundreds of performance metrics at the end of the build. This is a very lightweight logging level, and is safe (even advisable) to enable for all builds.

h: history

History logging prints messages related to the data tracked in the emake history file — both filesystem dependencies and autodep information. When history logging is enabled, emake will print a message every time a dependency is added to the history file, and it will print information about the files checked during up-to-date checks based on autodep data. Enable history logging if you suspect a problem with autodep behavior.

j: job

Job logging prints minimal messages related to the creation and execution of jobs. For each job you’ll see a message when it starts running, when it finishes running, and when emake checks the job for conflicts. If there is a conflict in the job, you’ll see a message about that too. If you just want a general overview of how the build is progressing, j-level logging is a good choice.

L: nmake lexer

emake uses a generated parser to process portions of nmake makefiles. Lexer debug logging enables the debug logging in that generated code. This is generally not useful to end users as it is too low-level.

l: ledger

Ledger debug logging prints information about build decisions based on data in the ledger file, as well as updates made to the ledger file. Enable it if you believe the ledger is not functioning correctly.

m: memory

When memory logging is enabled, emake will print memory usage metrics to the debug log once per second. This includes the total process memory usage as well as current and peak memory usage grouped into several “buckets” corresponding to various types of data in emake. For example, the “Operation” bucket indicates the amount of memory used to store file operations; the “Variable” bucket is the amount of memory used for makefile variables. This is most useful when you are experiencing an out-of-memory failure in emake, as it can provide guidance as to how memory is being utilitized during the build, and how quickly it is growing.

n: node

Node logging prints detailed information about all messages between emake and the agents, including filesystem data and commands executed. Together with job logging, this can give a very comprehensive picture of the behavior of a build. However, node logging is extremely verbose, so you should enable it only when you are chasing a specific problem.

o: parse output

When parse output logging is enabled, emake will preserve the raw result of parsing a makefile. The result is a binary file containing information about all the targets, rules, dependencies and variables extracted from makefiles read during a parse job. This can be useful when investigating parser incompatibility issues and scheduling issues (for example, if a rule is not being scheduled for execution when you expect). Note that this debug level only makes sense when parsing, which means you have to specify it in the ––emake-rdebug option. The parse results will be saved in the ––emake-rlogdir directory, named as parse_jobid.out. Note that the directory may be on the local disk of the remote nodes, depending on the value you specify!

p: parse

Parse debug logging prints extremely detailed information about the reading and interpretation of makefiles during a parse job. This is most useful when investigating parser compatibility issues. This output is very verbose, so you should only enable this when pursuing a specific problem. Note that like parse output logging, this debug level only makes sense during parsing, which means you have to specify it in the ––emake-rdebug option. The parse log files will be saved in the ––emake-rlogdir directory, named as parse_jobid.dlog. Note that the directory may be on the local disk of the remote nodes, depending on the value you specify!

r: parse relocation

Parse relocation logging prints low-level information about the process of transmitting parse result data to emake at the end of a parse job. It’s only used internally when we are extending the parse result format, and so is unlikely to be of interest to end users.

s: subbuild

Subbuild logging prints details about decisions made while using the emake subbuild feature. You should enable it if you believe that the subbuild feature is not working correctly.

Y: authentication

Authentication logging is a subset of node logging that prints only those messages related to authenticating emake to agents and vice-versa. If you are having problems using the authentication feature, you should enable this debug level.

post

6 reasons your development team should be using instant messaging

The ElectricAccelerator development team sits at desks less than 30 feet apart, but despite our close proximity, we don’t often speak to one another. To an outside observer this may seem to be a sign of disfunction in the team — after all, developers have to communicate to work effectively. Some people think we’re obviously not communicating, but the truth is that we’re not obviously communicating! That’s because we use instant messaging for most of our communications, including status updates, technical collaboration and even code reviews, rather than face-to-face conversations. I believe this has made my team more connected and more productive. Here are six reasons why instant messaging trumps face-to-face conversations for software teams.

1. Logging

The key advantage of instant messaging is that all conversations are logged automatically. As a result I’ve got records of every conversation with every member of my team for the past two years. That’s proven invaluable on a few occasions, to provide additional context for decisions made weeks or months earlier. Obviously this is not a replacement for other types of project documentation, but it is a fantastic supplement.

2. Non-intrusive

The second most important advantage of instant messaging is that it’s relatively non-intrusive, at least compared to a face-to-face conversation. We all know how important it is to get into and preserve a state of flow when programming. Spoken conversations, by social convention, command your immediate attention — effectively an interrupt of the highest order. When somebody comes to my desk to ask me something in person, they are implicitly saying, “What I have to say to you is more important than anything else you might be doing right now.” Sometimes that’s true, but many times it’s not. And yet every time somebody initiates a face-to-face conversation with me, it destroys whatever flow I might have developed.

In contrast, instant messaging allows me to defer a response until I reach a good breaking point, so people can ask questions without interrupting me.

3. Non-disruptive

Our office has an open floor plan, which means that instead of individual offices or cubicles, we have a single big room. This layout worked very well when the company had only 6 people, who were all working on the same project. Now the company employs over 100 people, with two separate development teams working on completely different products, so the open layout doesn’t work quite so well. Conversations between other people can be very distracting when you’re heads down on a tricky technical problem. By using instant messaging instead of face-to-face conversations, we significantly reduce the distraction for our collegues.

4. Simultaneous conversations

Carrying on multiple face-to-face conversations on disparate topics is practically impossible, but doing the same via instant messenger is simple. Every IM client I’ve seen displays the last several messages of each active conversation, so you have context when a new message arrives. That signficiantly reduces the mental burden associated with each conversation, so it becomes possible to sustain several simultaneously. I often have five conversations “active” during the work day, and sometimes even more.

5. Consistency

Unlike face-to-face conversations, IM works well regardless of the relative locations of the conversants. That means that it doesn’t matter if my colleague is in the office with me, or working from home, or working from a customer site, or halfway around the world. I can use the same tool to communicate with them, which in turn means I don’t have to change the way I work to accomodate changes in the way they are working.

6. Versatility

One final advantage of instant messaging compared to face-to-face conversation is the versatility of the medium. I can trivially share a code fragment with somebody via IM, or a link to an online resource. Try doing that in a face-to-face conversation: “Yeah, you should check out the STL reference docs, at aich tee tee pee colon slash slash double you double you double you dot …”.

Instant messaging: give it a try

If you’re not already using instant messaging in your development team, give it a try. There are multiple free IM services out there, and there are good free IM clients on every platform, including smart phones, so you’ve really got nothing to lose — but you might gain a more efficient, productive team. It worked for us.

post

LEGO “Ship It!” Awards

Scriptics Connect 1.1 "Ship It!" Award

What am I supposed to do with this?


When we wrapped up the ElectricAccelerator 6.0 release recently, I wanted to give my teammates something to commemorate the release. Traditionally these are called “Ship It!” awards, and they often take the form of a Lucite plaque or trophy, or even a physical copy of the product (on DVD or CD, for example) locked inside an acrylic block. I’ve gotten a couple of those over the years, and honestly I think they’re kind of a waste. The last one I got went on a shelf to collect dust for a few years before being relocated to the trash heap, which is a shame because those things are expensive. Really expensive. I can’t even imagine the cost of the monster awards that Microsoft gives out.

So I don’t really like the usual embodiment of the “Ship It” award, but I do really like the underlying idea. After all, shipping a software release is a significant accomplishment, the culmination of months or even years of effort by a team of smart individuals. And unlike many other human endeavors, there’s nothing tangible when you’re finished — no bridge spanning the bay nor tower reaching to the heavens. Having something to commemorate the accomplishment seems fitting, and it’s another small way that I can show my appreciation for everybody’s contributions.

The Ideal “Ship It!” Award

To me, the ideal “Ship It!” award has the following attributes:

  • Themeable: I wanted something I could customize for each release, while maintaining consistency across releases. I plan to make this a tradition.
  • Inexpensive: I wanted something I could bankroll myself, so I could retain complete creative control.
  • Compact: I wanted something that wouldn’t take up much space, so it would be portable and easy to display.
  • Geek appeal: I wanted something that my teammates would think is cool. Chunks of Lucite just don’t cut it.

LEGO “Ship It!” Awards

LEGO race car driver minifig

The winner!


After a few days of idle brainstorming and bouncing ideas off my manager and co-conspirator, I had what seemed like a great idea: LEGO minifigs. I could get a bunch of a specific LEGO minifig and give one to each person on the team. It fit all my criteria. There have been over 4,000 different minifigs released since 1978, according to The Cult of LEGO. In the last two years alone LEGO has release five minifig packs, each with 16 completely new figures, so I can count on having a unique character for every feature release for the next several years. Minifigs are cheap, too — the majority can be bought for as little as a couple dollars each on Amazon or ebay. They’re obviously small. And of course, minifigs are dripping with geek appeal. What techie doesn’t like LEGO?

There was just one small problem. Minifigs are a little bit too small. There’s nowhere to put the information that would identify what it represented — the product name, release version and date, and so on. A couple more days of brainstorming gave me the solution: custom baseball cards. There are several companies that will print custom baseball cards. These outfits are obviously intended for children’s sports teams, but they will happily print cards with whatever graphic you want. You just have to create images of the front and back of your card and upload to their website. And like the minifigs themselves, the cards are inexpensive, at about $1 per card.

The ElectricAccelerator 6.0 “Ship It!” Award

For the ElectricAccelerator 6.0 “Ship It!” Award, I chose the race car driver shown above (because Accelerator is all about performance, of course!). I bought the minifigs on ebay. I spent a couple hours designing the card, then ordered them from CustomSportsProducts.com. The front shows the minifig, the product name, version, and release date, and the major new features; the back lists the names of everybody on the team. Total cost for awards for the entire team was about $40 for materials — about the cost of just one traditional Lucite-based “Ship It” award.

I was a little nervous when I presented the awards to my team a couple weeks ago, but as it turns out I needn’t have been! The reception was overwhelmingly positive. Although I hadn’t explicitly planned it this way, the minifigs actually arrived unassembled, in individual pouches. Immediately upon getting theirs, each person dumped out the pieces and started assembly — it was practically instinctive! Several people commented out loud that the award was “Awesome!” or “Really cool,” and, of course, “Kinda nerdy, but cool!” With that kind of reaction, you can bet that I’m already planning for the next release.

And finally, here’s a picture of the ElectricAccelerator 6.0 “Ship It!” Award, as it is proudly displayed on my desk:

ElectricAccelerator 6.0 "Ship It!" Award

Who doesn't love LEGO?

post

What’s new in ElectricAccelerator 6.0

Last week Electric Cloud announced the release of ElectricAccelerator 6.0. The most exciting development in this version is the addition of ElectricAccelerator Developer Edition, which enables users to leverage the rapidly increasing horsepower on their multicore workstations to accelerate their builds, as well as the cluster agents we’ve always supported. Even better, Developer Edition allows you to use emake to accelerate builds even when disconnected from the network, with all of the benefits you’ve come to depend on: reliable parallel builds, accurate incrementals — all that good stuff.

But there’s much more to Accelerator 6.0 than just Developer Edition. Several improvements make this the most robust, secure and easy-to-use release ever. Here are the major new features:

Electrify command-line interface overhaul

electrify is the alternate front-end to the Accelerator cluster that enables users to accelerate non-make-based builds (like SCons and JAM) as well as general purpose cluster parallel computing tasks. Accelerator 5.0 was the first release to include electrify, but that version was relatively unpolished — a minimal viable product strategy that we hoped would allow us to prove the technology in the field and provide feedback to guide the technical direction of electrify over subsequent releases. As a result of that feedback we made significant improvements to both the functionality and performance of electrify in the 5.2 and 5.4 releases, but electrify still retained the “charm” of its original clumsy user interface. At long last, we’ve taken the first big step towards addressing that in 6.0, which incorporates a significantly streamlined electrify interface. I won’t bother showing the “old clunky way” here — if you’ve been using electrify you already know, and if you haven’t, I’d rather you never see it. But I’m very pleased to show you how easy it is to accelerate a SCons build with electrify in Accelerator 6.0:

electrify --emake-cm=eacm --electrify-remote=gcc:ld scons -j 8

There are still a few warts, but then, if we fixed everything what would people complain about?

Kerberos authentication

For environments with heightened security requirements, Accelerator 6.0 supports Kerberos authentication. This improvement ensures the identity of the individual running emake and the agents that participate in the build. That means that emake can be certain that any agents it connects to are legimiate agents, rather than trojans set up to capture potentially sensitive information from emake. At the same time, it means that the agents can be certain of the authenticity of the connection from emake, so malicious users cannot spoof the connection to the agent.

Note that at this time Accelerator only provides authentication of the connction between emake and an agent. It does not encrypt traffic once the connection is established.

Submake usage reporting

One thing that occassionally causes trouble for people switching to emake is the so-called submake stub problem. The issue arises when a build uses constructs like this:

foo.a:
	$(MAKE) -C sub foo.a && cp sub/foo.a ./foo.a

With emake, the recursive make invocation is not processed inline, as you might normally expect. Instead, to maximize parallel performance across all the recursive makes in the build, a make stub captures the invocation context, relays it to the top-level emake for incorporation into the overall build graph, and exits immediately. Because the cp call is part of the same command, it runs immediately after the stub, but since none of the work of the submake has actully executed yet, the cp will fail.

Correcting this requires a trivial makefile change. Instead of chaining the recursive make and the cp together in a single command, this rule can be rewritten to explicitly use two distinct commands. With this minor adjustment, emake is able to treat the cp as a separate job that logically occurs after the work in the recursive make:

foo.a:
	$(MAKE) -C sub foo.a
	cp sub/foo.a ./foo.a

Although it’s generally straightforward to remedy the problem, it’s not always easy to determine that a build has run into a submake stub problem. Thus in Accelerator 6.0 we added a feature that will make it easier to identify these situations: if you enable file- or lookup-level annotation detail, emake will record an explicit submake usage operation in the job that invoked the recursive make, so you can tell exactly when the submake occurs relative to other file usage in the job. In the example above, the usage log for the job will contain something like this:

<op type="submake" file="/home/ericm/build/sub"/>
<op type="lookup" file="/home/ericm/build/sub/foo.a" found="0"/>

The presence of file usage after the submake operation is a warning flag that the job may have a submake stub vulnerability.

Optionally enable directory read conflicts

Another long-standing thorn-in-the-side for users switching to emake is builds that depend on strict accuracy of directory listing operations. Briefly, although emake can ensure that directory listings are 100% correct throughout the build, for performance reasons emake deliberately ignores the directory read conflicts that would provide that assurance. For a thorough explanation of the problem and the rational for that design decision, see my previous post on Exceptions to conflict detection in ElectricMake. For now, suffice to say that Accelerator 6.0 includes a new command-line option that forces emake to honor directory read conflicts: –emake-readdir-conflicts=1.

Agent log rotation

Last, but certainly not least, after only 10 years Accelerator finally incorporates a feature that’s been standard on Unix operating systems probably since the dawn of the epoch: the Accelerator agent now monitors the size of its debug log, and automatically rolls over to a new logfile when the log gets too big. That’s sure to be welcome news to anybody that’s had to enable agent debug logging while working with our support team.

Looking forward

As you can see, there’s a lot of great improvements in Accelerator 6.0. As always, I’m tremendously proud of the effort my engineering team has put into the product. Of course, a product like Accelerator is never really done. We’re already working on the next release. Stay tuned for updates on what we’re doing next.

Accelerator 6.0 is available immediately for current customers — contact support@electric-cloud.com for details. New users contact sales@electric-cloud.com.

post

2011 Customer Summit “Conflicts” Handout

As promised, here’s the corrected handout to accompany the presentation I gave at the 2011 Electric Cloud Customer Summit, “Conflict Detection in ElectricMake”. This content is also available as a pair of blog articles:

  1. How ElectricMake guarantees reliable parallel builds
  2. Exceptions to conflict detection in ElectricMake

Thanks to everybody who came to the talk! I really enjoyed giving it and answering your questions. Looking forward to seeing you next year!

post

Exceptions to conflict detection in ElectricMake

In a previous article I covered the basic conflict detection algorithm in ElectricMake. It’s surprisingly simple, which is one of its strengths. But if ElectricMake strictly adhered to the simple definition of a conflict, many builds would be needlessly serialized, sapping performance. Over the years we’ve made a variety of tweaks to the core algorithm, adding support for special cases to improve performance. Here are some of those special cases.

Non-existence conflicts

One obvious enhancement is to ignore conflicts when the two versions are technically different, but effectively the same. The simplest example is when there are two versions of a file which both indicate non-existence, such as the initial version and the version created by job C in this chain for file foo:

Suppose that job D, which falls between C and E in serial order, runs before any other jobs finish. At runtime, D sees the initial version, but strictly speaking, if it had run in serial order it would have seen the version created by job C. But the two versions are functionally identical — both indicate that the file does not exist. From the perspective of the commands run in job D, there is no detectable difference in behavior regardless of which of these two versions was used. Therefore emake can safely ignore this conflict.

Directory creation conflicts

A common make idiom is mkdir -p $(dir $@) — that is, create the directory that will contain the output file, if it doesn’t already exist. This idiom is often used like so:

$(OUTDIR)/foo.o: foo.cpp
	@mkdir -p $(dir $@)
	@g++ -o $@ $^

Suppose that the directory does not exist when the build starts, and several jobs that employ this idiom start at the same time. At runtime they will each see the same filesystem state — namely, that the output directory does not exist. Each job will therefore create the directory. But in reality, had these jobs run serially, only the first job would have created the directory; the others would have seen the version created by the first job, and done nothing with the directory themselves. According to the simple definition of a conflict, all but the first (serial order) job would be considered in conflict. For builds without a history file expressing the dependency between the later jobs and the first, the performance impact would be disastrous.

Prior to Accelerator 5.4, there were two options for avoiding this performance hit: use a good history file, or arrange for the directories to be created before the build runs. Accelerator 5.4 introduced a refinement to the conflict detection algorithm which enables emake to suppress the conflict between jobs that both attempt to create the same directory, so even builds with no history file will not get conflicts in this scenario, without sacrificing correctness. (NB: you need not take special action to enjoy the benefits of this improvement).

Appending to files

Another surprisingly common idiom is to append error messages to a log file as the build proceeds:

$(OUTDIR)/foo.o: foo.cpp
	@g++ -o $@ $^ 2>> err.log

Implicitly, each append operation is dependent on the previous appends to the file — after all, how will you know which offset the new content should be written to if you don’t know how big the file was to begin with? In terms of file versions, you can imagine a naive implementation treating each append to the file as creating a complete new version of the file:

The problem of course is that you’ll get conflicts if you try to run all of these jobs in parallel. Suppose all three jobs, A, B and C start at the same time. They will each see the initial version, an empty file, but if run serially, only A would have seen that version. B would have seen the version created by A; C would have seen the version created by B.

This example is particularly interesting because emake cannot sort this out on its own: as long as the usage reported for err.log is the very generic “this file was modified, here’s the new content” message normally used for changes to the content of an existing file, emake has no choice but to declare conflicts and serialize these jobs. Fortunately, emake is not limited to that simple usage record. The EFS can detect that each modification is strictly appending to the file, with no regard to the prior contents, and include that detail in the usage report. Thus informed, emake can record fragments of the file, rather than the entire file content:

Since emake now knows that the jobs are not dependent on the prior content of the file, it need not declare conflicts between the jobs, even if they run in parallel. As emake commits the modifications from each job, it stitches the fragments together into a single file, with each fragment in the correct order relative to the other pieces.

Directory read conflicts

Directory read operations are interesting from the perspective of conflict detection. Consider: what does it mean to read a directory? The directory has no content of its own, not in the way that a file does. Instead, the “content” of a directory is the list of files in that directory. To check for conflicts on a directory read, emake must check whether the list of files that the reader job actually saw matches the list that it would have seen had it run in serial order — in essence, doing a simple conflict check on each of the files in the directory.

That’s conceptually easy to do, but the implications of doing so are significant: it means that emake will declare a conflict on the directory read anytime any other job creates or deletes any file in that directory. Compare that to reads on ordinary files: you only get a conflict if the read happens before a write operation on the same file. With directories, you can get a conflict for modifications to other files entirely.

This is particularly dangerous because many tools actually perform directory reads under-the-covers, and often those tools are not actually concerned with the complete directory contents. For example, a job that enumerates files matching *.obj in a directory is only interested in files ending with .obj. The creation of a file named foo.a in that directory should not affect the job at all.

Another nasty example comes from utilities that implement their own version of the getcwd() system call. If you’re going to roll your own version, the algorithm looks something like this:

  1. Let cwd = “”
  2. Let current = “.”
  3. Let parent = “./..”
  4. stat current to get its inode number.
  5. read parent until an entry matching that inode number is found.
  6. add the name from that entry to cwd
  7. Set current = parent.
  8. Set parent = parent + “/..”
  9. Repeat starting with step 4.

By following this algorithm the program can construct an absolute path for the current working directory. The problem is that the program has a read operation on every directory between the current directory and the root of the filesystem. If emake strictly adhered to conflict checking on directory reads, a job that used such a tool would be serialized against every job that created or deleted any file in any of those directories.

For this reason, emake deliberately ignores conflicts on directory read operations by default. Most of the time this is safe to do, surprisingly — often tools do not need a completely accurate list of the files in the directory. And in every case I’ve seen, even if the tool does require a perfectly correct list, the tool follows the directory read with reads of the files it finds. That means that you can ensure correct behavior by running the build one time with a single agent, to ensure the directory contents are correct when the job runs. That run will produce history based on the file reads, so subsequent builds can run with many agents and still produce correct results.

Starting with Accelerator 6.0, you can also use –emake-readdir-conflicts=1 to force emake to honor directory read conflicts.

Conclusion

Getting parallel builds that are fast is easy: just add -j to your make invocation. Getting parallel builds that are both fast and reliable is another story altogether. As you’ve seen, the core conflict detection algorithm in ElectricMake is simple, but after many years and hundreds of thousands of builds, we’ve enhanced that simple algorithm in a variety of special cases to provide even better performance. Future releases of ElectricAccelerator will include even more refinements to the algorithm.


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: