post

How long are the jobs in my build?

I’ve been playing with a new visualization for build data. I was looking for a way to really hammer home the point that in most builds, the vast majority of jobs are more-or-less the same length. The “Job Count by Length” report in ElectricInsight does the same thing, but in a “just the facts” manner. I wanted something that would be more visceral.

Then I struck on the idea of mapping the jobs onto a surface plot, using the job duration as the z-coordinate or “height”, so longer jobs have points high above the z-axis. In such a view, we would expect to see a mostly flat plain, with a small portion of points above the plain. Sure enough, that’s just what we get. Here’s an example, generated using data from a mozilla build:

Here’s what I like about this visualization:

  • Nails the primary goal. This visualization is great at demonstrating that most jobs in the build have about the same duration.
  • It’s looks cool. Given a choice between two visualizations that show the same data, the one that looks cooler definitely has an advantage.

Now, here’s what I don’t like about this visualization:

  • X- and Y-coordinates are arbitrary. For this prototype I just determined the smallest box large enough to show all the jobs in the build, then plotted the first job at 0,0; the second at 0,1, etc. This is simple, and it gives a compact display, but it would be nice if the X- and Y-coordinates had some actual meaning.
  • It’s hard to tell what Z-coordinate any given point has. For example, I can easily see that the vast majority of jobs have roughly the same duration, but what duration is that? 0 seconds? 1 second? 1/2 second?
  • A dense upper layer obscures lower layers. Although this build is unimodal, suppose it was instead bimodal — the density of points at height 5 might obscure the existence of points at height 3.

For comparison, here’s the “Job count by Length” report from ElectricInsight. It uses the same data, and tells the same story, but it’s not nearly as visually dramatic:

So, what do you think? Any ideas how I could use the X- and Y-coordinates to convey useful information? Keep reading if you want to see how I made this visualization.

How did I make it?

This visualization was really easy to make, because I leveraged other tools to do all of the hard work for me. The first was ElectricAccelerator, which I used to generate an annotated build log containing timing information for every job in my build (you could also use SparkBuild). Next, I used annolib, a library for processing and analyzing annotation files, to convert the data into x,y,z tuples. I wrote this trivial annolib script:

#!tclsh load annolib.so # Open up the annotation file, specified on the command-line. set xml [open [lindex $argv 0] r] # Create an anno object, and use it to parse the annotation data. set anno [anno create] $anno load $xml # Figure out how big the plane needs to be to show every job. set dim [expr {int(ceil(sqrt([$anno jobcount])))}] set row 0 set col 0 # Iterate through the jobs, printing (to stdout) the row, column, # and length of each. set begin [$anno jobs begin] set end [$anno jobs end] for {set job $begin} {$job != $end} {set job [$anno job next $job]} { puts "$row $col [$anno job length $job]" incr col if { $col == $dim } { # Hit the end of a row, start over on the next. set col 0 incr row } }

Finally, I took the output of that script and used gnuplot to render it, using this script:

#!gnuplot set terminal png nocrop small size 800,800 set view 75 set ticslevel 0.1 splot 'jobs.dat' using 1:2:3:($3) with points palette

Putting it all together:

$ emake --emake-annodetail=basic --emake-annofile=emake.xml $ tclsh mapit.tcl emake.xml > jobs.dat $ gnuplot plot.gp > plot.png

Comments

  1. How about time on the X axis and number of concurrent jobs on the Y axis? And then use the gplot 3D surface plot to give a more visual way to see the heights (and maybe the color map on the X-Y plane showing the areas with the greatest heights).

    • Eric Melski says:

      @Aaron: thanks, that’s an interesting idea! Unfortunately it doesn’t work out too well in practice. In this case there were only 6 concurrent jobs at any time, so the graph would end up being about 5,400 pixels “long” by 6 pixels “wide” if we want to have a discrete pixel for each job. Also, I think this arrangement would basically just duplicate the primary display in ElectricInsight, except in “3D”.

Trackbacks

  1. […] August 19, 2010 Eric Melski Leave a comment Go to comments In response to my post about visualizing the lengths of the jobs in a build, one reader suggested a few tweaks to my gnuplot script to make the graph a proper surface plot. I […]

  2. […] are two factors that explain this lackluster result. The first is homogeneity: in most builds, the majority of the jobs are more-or-less the same length. For example, 90% of the jobs in the Mozilla build are less than 0.25s long. 80% of the jobs in the […]

  3. […] a job will take. But if the build consists primarily of jobs that are about the same length (and most builds do), then you should see a steady stream of flashes […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: