post

How long are the jobs in my build? part 2

In response to my post about visualizing the lengths of the jobs in a build, one reader suggested a few tweaks to my gnuplot script to make the graph a proper surface plot. I like the look of this:

This version addresses some of the short-comings of my original:

  • It’s easier to determine the z-coordinate of a given point. In the original that was nearly impossible. It’s still a little tricky here because of the perspective, but it’s a step in the right direction.
  • Lower layers are not obscured. Originally, a dense layer of points could obscure points with a lower z-value. This version avoids that problem because you can see places where the surface dips.

Unfortunately, this version introduces some new problems:

  • Raw data points are averaged. In order to produce this surface plot, gnuplot computes a weighted average of the data points. Averaging itself is not necessarily a problem. The trouble here is that the layout of the data points is completely arbitrary, as you may recall from the previous post. That means that this plot effectively picks a handful of random data points, averages them, and plots the result. We still see the general trend — that most of the jobs are about the same length — but it feels a bit phony.
  • Implies patterns where there are none. When I first saw this image, I was struck by the “mountain range” running across the plot, a bit left of center. I hadn’t seen that in my original graph, so naturally I was intrigued. I spent hours trying to understand why that feature might be present, and finally came to this conclusion: it isn’t real. It’s just an artifact of the graphing method. Remember, the layout of the points is completely arbitrary, so it would be quite odd for there to really be a pattern like this cutting across the plot. In fact, I found that similar “features” appeared no matter what dimensions I used for the plot. I think the reason is that in this mode, gnuplot is not plotting the raw data, but rather a weighted average of adjacent points. This will tend to introduce relationships between those points that are not actually real.

OK, so this revised version is definitely interesting. I’m not sure that it’s better necessarily, given the defects I mentioned above. And unfortunately it doesn’t help at all with the issue of making something useful out of the X/Y coordinates. Nevertheless, thanks Aaron for the suggestion!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: