How long are the jobs in my build?

I’ve been playing with a new visualization for build data. I was looking for a way to really hammer home the point that in most builds, the vast majority of jobs are more-or-less the same length. The “Job Count by Length” report in ElectricInsight does the same thing, but in a “just the facts” manner. I wanted something that would be more visceral.

Then I struck on the idea of mapping the jobs onto a surface plot, using the job duration as the z-coordinate or “height”, so longer jobs have points high above the z-axis. In such a view, we would expect to see a mostly flat plain, with a small portion of points above the plain. Sure enough, that’s just what we get. Here’s an example, generated using data from a mozilla build:

Here’s what I like about this visualization:

  • Nails the primary goal. This visualization is great at demonstrating that most jobs in the build have about the same duration.
  • It’s looks cool. Given a choice between two visualizations that show the same data, the one that looks cooler definitely has an advantage.

Now, here’s what I don’t like about this visualization:

  • X- and Y-coordinates are arbitrary. For this prototype I just determined the smallest box large enough to show all the jobs in the build, then plotted the first job at 0,0; the second at 0,1, etc. This is simple, and it gives a compact display, but it would be nice if the X- and Y-coordinates had some actual meaning.
  • It’s hard to tell what Z-coordinate any given point has. For example, I can easily see that the vast majority of jobs have roughly the same duration, but what duration is that? 0 seconds? 1 second? 1/2 second?
  • A dense upper layer obscures lower layers. Although this build is unimodal, suppose it was instead bimodal — the density of points at height 5 might obscure the existence of points at height 3.

For comparison, here’s the “Job count by Length” report from ElectricInsight. It uses the same data, and tells the same story, but it’s not nearly as visually dramatic:

So, what do you think? Any ideas how I could use the X- and Y-coordinates to convey useful information? Keep reading if you want to see how I made this visualization.

How did I make it?

This visualization was really easy to make, because I leveraged other tools to do all of the hard work for me. The first was ElectricAccelerator, which I used to generate an annotated build log containing timing information for every job in my build (you could also use SparkBuild). Next, I used annolib, a library for processing and analyzing annotation files, to convert the data into x,y,z tuples. I wrote this trivial annolib script:

#!tclsh load annolib.so # Open up the annotation file, specified on the command-line. set xml [open [lindex $argv 0] r] # Create an anno object, and use it to parse the annotation data. set anno [anno create] $anno load $xml # Figure out how big the plane needs to be to show every job. set dim [expr {int(ceil(sqrt([$anno jobcount])))}] set row 0 set col 0 # Iterate through the jobs, printing (to stdout) the row, column, # and length of each. set begin [$anno jobs begin] set end [$anno jobs end] for {set job $begin} {$job != $end} {set job [$anno job next $job]} { puts "$row $col [$anno job length $job]" incr col if { $col == $dim } { # Hit the end of a row, start over on the next. set col 0 incr row } }

Finally, I took the output of that script and used gnuplot to render it, using this script:

#!gnuplot set terminal png nocrop small size 800,800 set view 75 set ticslevel 0.1 splot 'jobs.dat' using 1:2:3:($3) with points palette

Putting it all together:

$ emake --emake-annodetail=basic --emake-annofile=emake.xml $ tclsh mapit.tcl emake.xml > jobs.dat $ gnuplot plot.gp > plot.png