Blinkenlights for ElectricAcclerator

Watching builds run is boring. I mean, there’s not really much to look at, besides the build log scrolling by. And the “bursty” nature of the output with ElectricAccelerator makes things even worse, since you’ll get a long pause with no apparent progress, followed by a blast of more output than you can handle — like drinking from a fire hose. Obviously stuff is going on during that long pause, but there’s nothing externally visible. Wouldn’t it be nice to see some kind of indication of the build progressing? Something like this:

I put together this visualization to satisfy my desire for a blinkenlights display for my build. Each light represents an agent used by the build, and it lights up every time a new job is dispatched to that agent. There’s no correlation between the amount of time it takes for the light to fade and the duration of the job, since there’s no way to know a priori how long a job will take. But if the build consists primarily of jobs that are about the same length (and most builds do), then you should see a steady stream of flashes throughout.

–emake-monitor

This visualization is powered by a relative new feature in ElectricAccelerator: add –emake-monitor=host:port to the emake command-line, and emake will broadcast status messages to the specified destination using UDP. As of Accelerator 5.2.0, emake generates four types of status messages. Each message is transmitted in plain text, as a space-separated list of words. The first word indicates the type of message; the remaining words are the parameters of the message:

  • ADD_JOB jobId jobType targetName: a new job has been added to the work queue.
  • START_JOB jobId time agent: a job has started running on the specified agent.
  • FINISH_JOB jobId time: a job has finished running.
  • FINISH_BUILD: the build has completed.

All you need is a program that listens for these messages and does something interesting with them. ElectricInsight is one such program: select the File -> Monitor live build… menu option, enter the same host:port information, and Insight will render the jobs in the build in real time as they run. Not bad, but not as glitzy as I’d like.

Writing blinkenlights

My blinkenlights visualization uses just one of the messages: START_JOB. Each time it receives the message, it maps the agent named in the message to one of the lights, illuminates it, and then fades it at a fixed rate. It’s written in Tcl/Tk, naturally, using a couple great third-party extensions, so the implementation is less than 100 lines of code.

The first extension is Tkpath, which I’ve mentioned previously. I used prect items to create the “lights”, and handled the fading effect by just progressively decreasing the alpha from fully opaque to fully transparent with a series of timer events firing at a predetermined rate.

The second extension is TclUDP, which makes it trivial to connect to a UDP socket from Tcl. Once I have that socket, I can use all the regular Tcl magic like fileevent to make my script automatically respond to the arrival of a new message.

Here’s the code in full:

package require tkpath
package require udp

# fade - update the opacity of the given item to the given value.  Afterwards,
# schedules another event to update the opacity again, to a slightly smaller
# value, until the value reaches zero.

proc fade {id {count 100}} {
    global events
    .c itemconfigure a$id -fillopacity [expr {double($count) / 100}]
    incr count -5
    catch {after cancel $events($id)}
    if { $count >= 0 } {
        set events($id) [after 5 [list fade $id $count]]
    }
}

# next - called whenever there is another message awaiting on the socket.

proc next {sock} {
    global ids
    set msg [read $sock]
    if { [lindex $msg 0] eq "START_JOB" } {
        set agent [lindex $msg 3]
        if { ![info exists ids($agent)] } {
            set ids($agent) [array size ids]
        }
        fade $ids($agent)
    }
}

# Set the dimensions; my test cluster has 16 agents, so I did a 4x4 layout.

set rows 4
set cols 4
set boxx 60
set boxy 60

# Set up the tkpath canvas and the "lights".

set c [::tkp::canvas .c -background black \
           -height [expr {($boxy * $rows) + 5}] \
           -width  [expr {($boxx * $cols) + 5}]]
wm geometry . [expr {($boxx * $cols) + 27}]x[expr {($boxy * $rows) + 27}]

for {set x 0} {$x < $cols} {incr x} {
    for {set y 0} {$y < $rows} {incr y} {
        set x1 [expr {($x * ($boxx + 5)) + 5}]
        set x2 [expr {$x1 + $boxx}]
        set y1 [expr {($y * ($boxy + 5)) + 5}]
        set y2 [expr {$y1 + $boxy}]
        set id [expr {($x * $rows) + $y}]
        .c create prect $x1 $y1 $x2 $y2 -rx 5 -fill #3399cc -tags a$id \
            -fillopacity 0
    }
}
pack .c -expand yes -fill both
wm title . "Cluster Blinkenlights"
update

# Get the host and port number from the command-line.

set host [lindex [split $argv :] 0]
set port [lindex [split $argv :] 1]

# Create the udp socket, set it to non-blocking mode, then set up a fileevent
# that will trigger anytime there's data available on the socket.

set sock [udp_open $port]
fconfigure $sock -buffering none -blocking 0 -remote [list $host $port]
fileevent $sock readable [list next $sock]

# Common idiom to keep the app running indefinitely.

set forever 0
vwait forever

Future work

This is a pretty fun way to monitor the status of a build in progress, but I think there are two things that could make it even better:

  • Watch the entire cluster, instead of just one build. Because this visualization is driven by data streaming from emake, for all practical purposes it’s limited to showing the activity in a single build. I would love to instead be able to view a single display showing the entire cluster, with concurrently running builds flickering in different colors. I think that would be a really interesting display, and might provide some insight into the cluster sharing behaviors of the entire system. I think to really do that properly, we’d need to be intercepting events from every agent, but unfortunately the agent doesn’t have a feature like –emake-monitor.
  • Make it an actual physical gadget. It might be fun to wire together some LED’s, maybe controlled by an arduino or something, to make a tangible device that could sit on my desk. It’s been a long, long time since I’ve done anything like that though. Plus, if there are a lot of agents in the cluster, it may be costly and impractical to manufacture.

What do you think?

Are you using the right colorspace?

If you’re like me, a programmer with no formal UI design training, you’re probably accustomed to working with colors in terms of their RGB values. And, if you’re like me, you’ve probably been frustrated by the seeming irrationality of that colorspace. For example, suppose you want to find the right foreground color for a given background to ensure high legibility. If you’re stuck in RGB-land, there’s no reliable way to get from point A to point B. If you do find a combination that works, the relationship between the two colors often seems arbitrary.

I recently learned that my singular focus on RGB is the problem, because it has no relationship to the way that the human eye perceives color. Switch to a different colorspace, like HSV (for hue, saturation, and value) and voila! Suddenly colors make sense. If you’re doing any sort of UI design, and you’re working exclusively in the RGB colorspace, you’re doing it wrong.

For legibility, use HSV

Unfortunately, I’ve found that there’s no single “best” colorspace. Some problems are better solved in one colorspace, other problems in another. When choosing a text color to maximize legibility against a given background, HSV works really well. Here’s some examples, with the foreground and background colors in both RGB and HSV:

RGB HSV
The quick brown fox … 147 196 147 120 25 77 Foreground
51 68 51 120 25 27 Background
The quick brown fox … 110 127 127 180 13 50 Foreground
221 255 255 180 13 100 Background
The quick brown fox … 51 76 102 210 50 30 Foreground
102 153 204 210 50 80 Background

I could keep going, but I’m sure you see the point: in the RGB colorspace, there’s no predictable relationship between the foreground and background colors. In HSV, it’s a nice, regular pattern. That definitely appeals to the rational programmer in me. If you’re looking for a foreground color yourself, I suggest starting with a delta in value of at least 30.

For gradients, use HSL

When you’re trying to generate a color gradient, I’ve found that the best choice is HSL, for hue, saturation and lightness (note that hue and saturation here have slightly different meanings than in HSV). Here’s an example, with both RGB and HSL values:

RGB HSL
The quick brown fox … 51 149 204 56 60 50
71 160 209 56 60 55
92 170 214 56 60 60
112 181 219 56 60 65
133 191 224 56 60 70
153 202 230 56 60 75
173 213 235 56 60 80
194 223 240 56 60 85
214 234 245 56 60 90

Again, the progression in RGB is awkward and seemingly unpredictable; the progression in HSL is simple.

Is RGB good for anything?

Obviously RGB is good for something: hardware, where colors are literally created by the combination of red, green and blue LED’s (or phosphors, if you’re old school) in varying intensities. That’s why RGB is so prevalent in graphics libraries and programming in general — the concept just bled up through the abstraction layers.

Also, keep in mind that you can convert back-and-forth between RGB and HSV, or RGB and HSL. That means that the RGB values shown above are not really as “arbitrary” as I made them out to be — but the conversions are complex, much too difficult to do in your head. So it’s much easier to work in HSV or HSL, then convert only at the end, just before you have to specify the color to the computer.

I wrote a little Tcl/Tk app that lets me play around with all three colorspaces simultaneously; you’re welcome to it here. If you want to read more about color selection, I highly recommend Choosing Colors for Data Visualization [PDF], by Maureen Stone.

How many agents did my build use?

When you run a parallel build, how many jobs are actually running in parallel during the life of the build? If you’re using ElectricAccelerator, you can load the build annotation file in ElectricInsight and eyeball it, as long as you have a small, uncongested cluster. But if you have a big cluster, and lots of other builds running simultaneously, the build may touch many more distinct agents than it actually uses simultaneously at any given point. It’d be great to see a simple chart like this:

With this graph I can see at a glance that this build used 48 agents most of the time, although there was a lot of time when it used only one agent, probably due to serializations in the build. In this post I’ll show you how to generate a report like this using data from an annotation file.

Counting agents in use

Counting the agents in use over the lifetime of the build is a simple algorithm: make a list of all the job start and end events in the build, sorted by time. Then scan the list, incrementing the count of agents in use every time you find a start event, and decrementing it every time you find an end event. Here’s the code, using annolib, the annotation analysis library:

#!tclsh
load annolib.so

proc CountAgents {annofile} {
    global anno total

    set xml  [open $annofile r]
    set anno [anno create]
    $anno load $xml

    # These values will tell us what type of event we have later.

    set START_EVENT  1
    set END_EVENT   -1

    # Iterate through all the jobs in the build.

    set first [$anno jobs begin]
    set last  [$anno jobs end]
    for {set job $first} {$job != $last} {set job [$anno job next $job]} {
        # Get the timing information for this job.  If this job was not
        # actually run, its timing information will be empty.

        set t [lindex [$anno job timing $job] 0]
        if { [llength $t] == 0 } {
            continue
        }
        foreach {start end agent} $t {
            break
        }

        # Add a start and an end event for this job to the master list.

        lappend events [list $start $START_EVENT] [list $end $END_EVENT]
    }

    # Order the events chronologically.

    set events [lsort -real -increasing -index 0 $events]

    # Scan the list of events.  Every time we see a START event, increment
    # the count of agents in use; every time we see an END event, decrement
    # the count.  This way, "count" always reflects the number of agents
    # in use.

    set count 0
    set last  0
    foreach event $events {
        foreach {t e} $event { break }
        if { ![info exists total($count)] } {
            set total($count) 0
        }

        # Add the time interval between the current and the previous event 
        # to the total time for "count".

        set total($count) [expr {$total($count) + ($t - $last)}]

        # Update the in-use counter.  I chose the event type values
        # so that we can simply add the event type to the counter.

        incr count $e

        # Track the current time, so we can compute the size of the next
        # interval.

        set last $t
    }
}

CountAgents [lindex $argv end]

After this code runs, we’ll have the amount of time spent using one agent, two agents, three agents, etc. in the global array total. The only thing left to do is output the result in a usable form:

set output "-raw"
if { [llength $argv] >= 2 } {
    set output [lindex $argv 0]
}
switch -- $output {
    "-raw" {
        foreach count [lsort -integer [array names total]] {
            if { $total($count) > 0.0001 } {
                puts "$count $total($count)"
            }
        }
    }

    "-text" {
        set duration [$anno duration]
        puts "Agents in use by portion of build time"
        foreach count [lsort -integer [array names total]] {
            set len [expr {round(double($total($count)*70) / $duration)}]
            if { $len > 0 } {
                puts [format "%2d %s" $count [string repeat * $len]]
            }
        }
    }

    "-google" {
        set url "http://chart.apis.google.com/chart"
        append url "?chs=300x225"
        append url "&cht=p"
        append url "&chtt=Agents+in+use+by+portion+of+build+time"
        append url "&chco=3399CC"
        set lbl ""
        set dat ""
        set lblsep ""
        set datsep ""
        set duration [$anno duration]
        foreach count [lsort -integer [array names total]]  {
            set pct [expr {($total($count) * 100) / $duration}]
            if { $pct >= 1.0 } {
                append lbl $lblsep$count
                append dat $datsep[format "%0.2f" $pct]
                set lblsep "|"
                set datsep ","
            }
        }
        append url "&chd=t:$dat"
        append url "&chl=$lbl"
        puts $url
    }
}

This gives us three choices for the output format:

  • -raw, which just dumps the raw data, one entry per line.
  • -text, which formats the data as a simple ASCII bar chart.
  • -google, which emits a Google Charts URL you can put into your browser to see a chart like the one at the top of this post.

For example, if I run this script as tclsh count_agents.tcl -text sample.xml, the output looks like this:

Agents in use by portion of build time 0 *** 1 ***************** 2 *** 3 * 4 * 5 * 47 * 48 ************************************

So that’s it: another trivial annolib script, another slick build visualization!

How long are the jobs in my build? part 2

In response to my post about visualizing the lengths of the jobs in a build, one reader suggested a few tweaks to my gnuplot script to make the graph a proper surface plot. I like the look of this:

This version addresses some of the short-comings of my original:

  • It’s easier to determine the z-coordinate of a given point. In the original that was nearly impossible. It’s still a little tricky here because of the perspective, but it’s a step in the right direction.
  • Lower layers are not obscured. Originally, a dense layer of points could obscure points with a lower z-value. This version avoids that problem because you can see places where the surface dips.

Unfortunately, this version introduces some new problems:

  • Raw data points are averaged. In order to produce this surface plot, gnuplot computes a weighted average of the data points. Averaging itself is not necessarily a problem. The trouble here is that the layout of the data points is completely arbitrary, as you may recall from the previous post. That means that this plot effectively picks a handful of random data points, averages them, and plots the result. We still see the general trend — that most of the jobs are about the same length — but it feels a bit phony.
  • Implies patterns where there are none. When I first saw this image, I was struck by the “mountain range” running across the plot, a bit left of center. I hadn’t seen that in my original graph, so naturally I was intrigued. I spent hours trying to understand why that feature might be present, and finally came to this conclusion: it isn’t real. It’s just an artifact of the graphing method. Remember, the layout of the points is completely arbitrary, so it would be quite odd for there to really be a pattern like this cutting across the plot. In fact, I found that similar “features” appeared no matter what dimensions I used for the plot. I think the reason is that in this mode, gnuplot is not plotting the raw data, but rather a weighted average of adjacent points. This will tend to introduce relationships between those points that are not actually real.

OK, so this revised version is definitely interesting. I’m not sure that it’s better necessarily, given the defects I mentioned above. And unfortunately it doesn’t help at all with the issue of making something useful out of the X/Y coordinates. Nevertheless, thanks Aaron for the suggestion!

How long are the jobs in my build?

I’ve been playing with a new visualization for build data. I was looking for a way to really hammer home the point that in most builds, the vast majority of jobs are more-or-less the same length. The “Job Count by Length” report in ElectricInsight does the same thing, but in a “just the facts” manner. I wanted something that would be more visceral.

Then I struck on the idea of mapping the jobs onto a surface plot, using the job duration as the z-coordinate or “height”, so longer jobs have points high above the z-axis. In such a view, we would expect to see a mostly flat plain, with a small portion of points above the plain. Sure enough, that’s just what we get. Here’s an example, generated using data from a mozilla build:

Here’s what I like about this visualization:

  • Nails the primary goal. This visualization is great at demonstrating that most jobs in the build have about the same duration.
  • It’s looks cool. Given a choice between two visualizations that show the same data, the one that looks cooler definitely has an advantage.

Now, here’s what I don’t like about this visualization:

  • X- and Y-coordinates are arbitrary. For this prototype I just determined the smallest box large enough to show all the jobs in the build, then plotted the first job at 0,0; the second at 0,1, etc. This is simple, and it gives a compact display, but it would be nice if the X- and Y-coordinates had some actual meaning.
  • It’s hard to tell what Z-coordinate any given point has. For example, I can easily see that the vast majority of jobs have roughly the same duration, but what duration is that? 0 seconds? 1 second? 1/2 second?
  • A dense upper layer obscures lower layers. Although this build is unimodal, suppose it was instead bimodal — the density of points at height 5 might obscure the existence of points at height 3.

For comparison, here’s the “Job count by Length” report from ElectricInsight. It uses the same data, and tells the same story, but it’s not nearly as visually dramatic:

So, what do you think? Any ideas how I could use the X- and Y-coordinates to convey useful information? Keep reading if you want to see how I made this visualization.
(more…)