When you run a parallel build, how many jobs are actually running in parallel during the life of the build? If you’re using ElectricAccelerator, you can load the build annotation file in ElectricInsight and eyeball it, as long as you have a small, uncongested cluster. But if you have a big cluster, and lots of other builds running simultaneously, the build may touch many more distinct agents than it actually uses simultaneously at any given point. It’d be great to see a simple chart like this:
With this graph I can see at a glance that this build used 48 agents most of the time, although there was a lot of time when it used only one agent, probably due to serializations in the build. In this post I’ll show you how to generate a report like this using data from an annotation file.
Counting agents in use
Counting the agents in use over the lifetime of the build is a simple algorithm: make a list of all the job start and end events in the build, sorted by time. Then scan the list, incrementing the count of agents in use every time you find a start event, and decrementing it every time you find an end event. Here’s the code, using annolib, the annotation analysis library:
#!tclsh load annolib.so proc CountAgents {annofile} { global anno total set xml [open $annofile r] set anno [anno create] $anno load $xml # These values will tell us what type of event we have later. set START_EVENT 1 set END_EVENT -1 # Iterate through all the jobs in the build. set first [$anno jobs begin] set last [$anno jobs end] for {set job $first} {$job != $last} {set job [$anno job next $job]} { # Get the timing information for this job. If this job was not # actually run, its timing information will be empty. set t [lindex [$anno job timing $job] 0] if { [llength $t] == 0 } { continue } foreach {start end agent} $t { break } # Add a start and an end event for this job to the master list. lappend events [list $start $START_EVENT] [list $end $END_EVENT] } # Order the events chronologically. set events [lsort -real -increasing -index 0 $events] # Scan the list of events. Every time we see a START event, increment # the count of agents in use; every time we see an END event, decrement # the count. This way, "count" always reflects the number of agents # in use. set count 0 set last 0 foreach event $events { foreach {t e} $event { break } if { ![info exists total($count)] } { set total($count) 0 } # Add the time interval between the current and the previous event # to the total time for "count". set total($count) [expr {$total($count) + ($t - $last)}] # Update the in-use counter. I chose the event type values # so that we can simply add the event type to the counter. incr count $e # Track the current time, so we can compute the size of the next # interval. set last $t } } CountAgents [lindex $argv end]
After this code runs, we’ll have the amount of time spent using one agent, two agents, three agents, etc. in the global array total. The only thing left to do is output the result in a usable form:
set output "-raw" if { [llength $argv] >= 2 } { set output [lindex $argv 0] } switch -- $output { "-raw" { foreach count [lsort -integer [array names total]] { if { $total($count) > 0.0001 } { puts "$count $total($count)" } } } "-text" { set duration [$anno duration] puts "Agents in use by portion of build time" foreach count [lsort -integer [array names total]] { set len [expr {round(double($total($count)*70) / $duration)}] if { $len > 0 } { puts [format "%2d %s" $count [string repeat * $len]] } } } "-google" { set url "http://chart.apis.google.com/chart" append url "?chs=300x225" append url "&cht=p" append url "&chtt=Agents+in+use+by+portion+of+build+time" append url "&chco=3399CC" set lbl "" set dat "" set lblsep "" set datsep "" set duration [$anno duration] foreach count [lsort -integer [array names total]] { set pct [expr {($total($count) * 100) / $duration}] if { $pct >= 1.0 } { append lbl $lblsep$count append dat $datsep[format "%0.2f" $pct] set lblsep "|" set datsep "," } } append url "&chd=t:$dat" append url "&chl=$lbl" puts $url } }
This gives us three choices for the output format:
- -raw, which just dumps the raw data, one entry per line.
- -text, which formats the data as a simple ASCII bar chart.
- -google, which emits a Google Charts URL you can put into your browser to see a chart like the one at the top of this post.
For example, if I run this script as tclsh count_agents.tcl -text sample.xml, the output looks like this:
Agents in use by portion of build time 0 *** 1 ***************** 2 *** 3 * 4 * 5 * 47 * 48 ************************************
So that’s it: another trivial annolib script, another slick build visualization!
One thought on “How many agents did my build use?”