post

ElectricAccelerator job statuses, or, what the heck is a skipped job?

If you’ve ever looked at an annotation file generated by Electric Make, you may have noticed the status attribute on jobs. In this article I’ll explain what that attribute means so that you can better understand the performance and behavior of your builds.

Annotation files and the status attribute

An annotation file is an XML-enhanced version of the build log, optionally produced by Electric Make as it executes a build. In addition to the regular log content, annotation includes details like the duration of each job and the dependencies between jobs. the status attribute is one of several attributes on the <job> tag:

<job id="J00007fba40004240"  status="conflict" thread="7fba4a7fc700" type="rule" name="a" file="Makefile" line="6" neededby="J00007fba400042e0">
<conflict type="file" writejob="J00007fba400041f0" file="/home/ericm/src/a" rerunby="J0000000002a27890"/>
<timing invoked="0.512722" completed="3.545985" node="ericm15-2"/>
</job>

The status attribute may have one of five values:

  • normal
  • reverted
  • skipped
  • conflict
  • rerun

Let’s look at the meaning of each in detail.

normal jobs

Normal jobs are just that: completely normal. Normal jobs ran as expected and were later found to be free of conflicts, so their outputs and filesystem modifications were incorporated into the final build result. Note that normal is the default status, so it’s usually not explicitly specified in the annotation. That is, if a job does not have a status attribute, its status is normal. If you run the following makefile with emake, you’ll see the all job has normal status:

1
all: ; @echo done

Here’s the annotation for the normal job:

<job id="Jf3502098" thread="f46feb40" type="rule" name="all" file="Makefile" line="1">
<command line="1">
<argv>echo done</argv>
<output src="prog">done
</output>
</command>
<timing weight="0" invoked="0.333677" completed="0.343138" node="chester-1"/>
</job>

reverted and skipped jobs

Reverted and skipped jobs are two sides of the same coin. In both cases, emake has determined that running the job is unnecessary, either because of an error in a serially earlier job, or because a prerequisite of the job was itself reverted or skipped. Remember, emake’s goal is to produce output that is identical to a serial GNU make build. In that context, barring the use of –keep-going or similar features, jobs following an error would not be run, so to preserve compatibility with that baseline, emake must not run those jobs either — or at least emake must appear not to have run those jobs.

That brings us to the sole difference between reverted and skipped jobs. Reverted jobs had already been executed (or had at least started) at the point when emake discovered the error that would have caused them not to run. Any output produced by a reverted job is discarded, so it has no effect on the final output of the build. In contrast, skipped jobs had not yet been started when the error was discovered. Since the job had not yet run, there is not output to discard.

Running the following simple makefile with at least two agents should produce one reverted job, b, and one skipped job, c.

1
2
3
4
5
6
7
all: a b c
a: ; @sleep 2 && exit 1
b: ; @sleep 2
c: b ; @echo done

Here’s the annotation for the reverted and skipped jobs:

<job id="Jf3502290"  status="reverted" thread="f3efdb40" type="rule" name="b" file="Makefile" line="5" neededby="Jf35022f0">
<timing weight="0" invoked="0.545836" completed="2.558411" node="chester-1"/>
</job>
<job id="Jf35022c0"  status="skipped" thread="0" type="rule" name="c" file="Makefile" line="7" neededby="Jf35022f0">
<timing weight="0" invoked="0.000000" completed="0.000000" node=""/>
</job>

conflict jobs

A job has conflict status if emake detected a conflict in the job: the job ran too early, and used a different version of a file than it would have had it run in the correct serial order. Any output produced by the job is discarded, since it is probably incorrect, and a rerun job is created to replace the conflict job. The following simple makefile will produce a conflict job if run without an emake history file, and with at least two agents:

1
2
3
4
5
all: a b
a: ; @sleep 2 && echo hello > a
b: ; @cat a

Here’s the annotation for the conflict job:

<job id="Jf33021d0"  status="conflict" thread="f44ffb40" type="rule" name="b" file="Makefile" line="5" neededby="Jf3302200">
<conflict type="file" writejob="Jf33021a0" file="/home/ericm/blog/melski.net/job_status/tmp/a" rerunby="J09b1b3c8"/>
<timing weight="0" invoked="0.541597" completed="0.552053" node="chester-2"/>
</job>

rerun jobs

A rerun job is created to replace a conflict job, rerunning the commands from the original conflict job but with a corrected filesystem context, to ensure the job produces the correct result. There are a few key things to keep in mind when you’re looking at rerun jobs:

  • By design, rerun jobs are executed after any serially earlier jobs have been verified conflict-free and committed to disk. That’s a consequence of the way that emake detects conflicts: each job is checked, in strict serial order, and committed only if it has not conflict. If a job has a conflict it is discarded as described above, and a rerun job is created to redo the work of that job.
  • It is impossible for a rerun job to have a conflict, since it is guaranteed not to run until all preceeding jobs have finished. In fact, emake does not even bother to check for conflicts on rerun jobs.
  • Rerun jobs are executed immediately upon being created, and while the rerun job is running emake will not start any other jobs. Any jobs that were already running when the rerun started are allowed to finish, but new jobs must wait until the rerun completes. Although this impairs performance in some cases, this conservation strategy helps to avoid chains of conflicts that would be even more detrimental to performance. Of course you typically won’t see conflicts and reruns if you run your build with a good history file, so in practice the performance impact of rerun jobs is immaterial.

The following simple makefile will produce a rerun job, if run without a history file and using at least two agents (yes, this is the same makefile that we used to demonstrate a conflict job!):

1
2
3
4
5
all: a b
a: ; @sleep 2 && echo hello > a
b: ; @cat a

And here’s the annotation fragment for the rerun job:

<job id="J09b1b3c8"  status="rerun" thread="f3cfeb40" type="rule" name="b" file="Makefile" line="5" neededby="Jf3302200">
<command line="5">
<argv>cat a</argv>
<output src="prog">hello
</output>
</command>
<timing weight="0" invoked="2.283755" completed="2.293563" node="chester-1"/>
</job>

Job status in ElectricAccelerator annotation

To the uninitiated, ElectricAccelerator job status types might seem cryptic and mysterious, but as you’ve now seen, there’s really not much to it. I hope you’ve found this article informative. As always, if you have any questions or suggestions, hit the comments below. And don’t forget to checkout Ask Electric Cloud if you are looking for help with Electric Cloud tools!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: