I’ve been playing with a new visualization for build data. I was looking for a way to really hammer home the point that in most builds, the vast majority of jobs are more-or-less the same length. The “Job Count by Length” report in ElectricInsight does the same thing, but in a “just the facts” manner. I wanted something that would be more visceral.
Then I struck on the idea of mapping the jobs onto a surface plot, using the job duration as the z-coordinate or “height”, so longer jobs have points high above the z-axis. In such a view, we would expect to see a mostly flat plain, with a small portion of points above the plain. Sure enough, that’s just what we get. Here’s an example, generated using data from a mozilla build:
Here’s what I like about this visualization:
- Nails the primary goal. This visualization is great at demonstrating that most jobs in the build have about the same duration.
- It’s looks cool. Given a choice between two visualizations that show the same data, the one that looks cooler definitely has an advantage.
Now, here’s what I don’t like about this visualization:
- X- and Y-coordinates are arbitrary. For this prototype I just determined the smallest box large enough to show all the jobs in the build, then plotted the first job at 0,0; the second at 0,1, etc. This is simple, and it gives a compact display, but it would be nice if the X- and Y-coordinates had some actual meaning.
- It’s hard to tell what Z-coordinate any given point has. For example, I can easily see that the vast majority of jobs have roughly the same duration, but what duration is that? 0 seconds? 1 second? 1/2 second?
- A dense upper layer obscures lower layers. Although this build is unimodal, suppose it was instead bimodal — the density of points at height 5 might obscure the existence of points at height 3.
For comparison, here’s the “Job count by Length” report from ElectricInsight. It uses the same data, and tells the same story, but it’s not nearly as visually dramatic:
So, what do you think? Any ideas how I could use the X- and Y-coordinates to convey useful information? Keep reading if you want to see how I made this visualization.
How did I make it?
This visualization was really easy to make, because I leveraged other tools to do all of the hard work for me. The first was ElectricAccelerator, which I used to generate an annotated build log containing timing information for every job in my build (you could also use SparkBuild). Next, I used annolib, a library for processing and analyzing annotation files, to convert the data into x,y,z tuples. I wrote this trivial annolib script:
Finally, I took the output of that script and used gnuplot to render it, using this script:
Putting it all together: