Documentation—clustersnapshot
NAME
clustersnapshot - retrieve live benchmark and capacity stats for nodes
SYNOPSIS
The clustersnapshot utility retrives a live statistic table detailing node capacity and resources.
Retrieving Stats
The basic stat table shows all nodes.
> clustersnapshot host b_all b_cpu b_io b_mem date live load mhz nrunning nusers uptime 0of1 2.336 0.718 0.624 0.994 22:20:48 1 2.1 1992 3 0 108 0of3 2.293 0.726 0.580 0.987 21:42:55 1 2.1 1992 3 0 156 0of4 2.837 0.574 1.407 0.856 22:20:47 1 2.1 2522 3 0 156 0of8 2.553 0.517 1.243 0.794 22:20:47 1 0.1 2792 1 3 90 1of0 2.345 0.728 0.619 0.998 22:20:48 1 2.1 1992 3 0 156 1of1 2.355 0.724 0.633 0.998 21:42:45 1 2.1 1992 3 2 154 ....
Custom Stats
You can retrieve stats using any command understood by the clusterpunchserver daemon. These commands are described in the clusterpunch.pm module documentation. Basically, the commands should be made up of the names of the punches defined in configuration files. If you have a punch like
<punch> name = mypunch01 ... </punch>
then you can have all the nodes execute the punch using
-c "mypunch01"
If your punch is designed to accept arguments, you should use
-c "mypunch01(arg,arg,...)"
# a custom load benchmark - get the 15 minute load for nodes and have # them also report the time in HH:MM format > clustersnapshot -c "load(15);date(%H:%M)" host date live load 0of0 22:23 1 2.0 0of1 22:23 1 2.0 0of2 22:23 1 2.0 0of3 22:23 1 2.0 0of4 22:23 1 2.0 ...
Sorting Stats
You can sort the stats by the value of any punch. The way a punch is sorted is defined in the configuration file using the 'sort' and 'alphasort' directives. The punch which returns the MHz rating is sorted in a descending fashion, but load and benchmark times are sorted in ascending fashion. Either way, the more desireable nodes are at the top ;)
> clustersnapshot -c "load" -s "load" host live load 0of8 1 0.0 2of8 1 1.9 7of7 1 2.0 4of8 1 2.0 4of3 1 2.0 ... 3of4 1 5.1 2of4 1 5.1 1of4 1 5.3 6of4 1 6.0 TOTAL 60 184.5
Well, looks like 0of8 is sitting idle and 6of4 is cooking.
Adjusting Timeout
> clustersnapshot -t 15
Setting Port and/or Broadcast
> clustersnapshot -p 8095 -b 10.1.2.255
Debug Mode
You can receive debug text during the polling use -d. The output is produced by the excellent Data::Dumper module. As you can see, the nodes send a hash back to the client.
> clustersnapshot -d ... '1of8' => { 'b_mem' => '0.795894', 'load' => '2.08', 'nusers' => '0', 'live' => 1, 'uptime' => '90.5272106481481', 'b_io' => '1.234498', 'b_cpu' => '0.518359', 'b_all' => '2.548751', 'date' => '22:28:19', 'host' => '1of8', 'nrunning' => 3, 'mhz' => 2792 }, ...
DESCRIPTION
This is part of the clusterpunch system. Each node that you wish to monitor must be running the clusterpunch.shutdown daemon, documented in the clusterpunch.shutdown. Once the server is running, you can poll it and send commands using the API documented in clusterpunchclient.pm, or through the various utilities, such as this one.
SEE ALSO
Daemons
clusterpunchserver, clusterpunch.start, clusterpunch.shutdown
API
clusterpunch.pm
Utilities
benchdriver, clusterbench, clusterlogin, clusternodecount, clustersnapshot
CHANGES
AUTHOR
Martin Krzywinski (martink@bcgsc.ca), January 2003
$Id: clustersnapshot,v 1.8 2003/02/03 21:13:16 martink Exp $