Clusterpunch: a distributed mini-benchmark system for clusters

Documentation—clustersnapshot

clustersnapshot - retrieve live benchmark and capacity stats for nodes

NAME

clustersnapshot - retrieve live benchmark and capacity stats for nodes


SYNOPSIS

The clustersnapshot utility retrives a live statistic table detailing node capacity and resources.


Retrieving Stats

The basic stat table shows all nodes.

 > clustersnapshot
        host  b_all  b_cpu   b_io  b_mem     date live load   mhz nrunning nusers  uptime
        0of1  2.336  0.718  0.624  0.994 22:20:48    1  2.1  1992        3      0     108
        0of3  2.293  0.726  0.580  0.987 21:42:55    1  2.1  1992        3      0     156
        0of4  2.837  0.574  1.407  0.856 22:20:47    1  2.1  2522        3      0     156
        0of8  2.553  0.517  1.243  0.794 22:20:47    1  0.1  2792        1      3      90
        1of0  2.345  0.728  0.619  0.998 22:20:48    1  2.1  1992        3      0     156
        1of1  2.355  0.724  0.633  0.998 21:42:45    1  2.1  1992        3      2     154
        ....


Custom Stats

You can retrieve stats using any command understood by the clusterpunchserver daemon. These commands are described in the clusterpunch.pm module documentation. Basically, the commands should be made up of the names of the punches defined in configuration files. If you have a punch like

 <punch>
  name = mypunch01
  ...
 </punch>

then you can have all the nodes execute the punch using

 -c "mypunch01"

If your punch is designed to accept arguments, you should use

 -c "mypunch01(arg,arg,...)"

 # a custom load benchmark - get the 15 minute load for nodes and have 
 # them also report the time in HH:MM format
 > clustersnapshot -c "load(15);date(%H:%M)"
        host     date live load
        0of0    22:23    1  2.0
        0of1    22:23    1  2.0
        0of2    22:23    1  2.0
        0of3    22:23    1  2.0
        0of4    22:23    1  2.0
        ...


Sorting Stats

You can sort the stats by the value of any punch. The way a punch is sorted is defined in the configuration file using the 'sort' and 'alphasort' directives. The punch which returns the MHz rating is sorted in a descending fashion, but load and benchmark times are sorted in ascending fashion. Either way, the more desireable nodes are at the top ;)

 > clustersnapshot -c "load" -s "load"
        host live load
        0of8    1  0.0
        2of8    1  1.9
        7of7    1  2.0
        4of8    1  2.0
        4of3    1  2.0
        ...
        3of4    1  5.1
        2of4    1  5.1
        1of4    1  5.3
        6of4    1  6.0
       TOTAL   60 184.5

Well, looks like 0of8 is sitting idle and 6of4 is cooking.


Adjusting Timeout

 > clustersnapshot -t 15


Setting Port and/or Broadcast

 > clustersnapshot -p 8095 -b 10.1.2.255


Debug Mode

You can receive debug text during the polling use -d. The output is produced by the excellent Data::Dumper module. As you can see, the nodes send a hash back to the client.

 > clustersnapshot -d
          ...
          '1of8' => {
                      'b_mem' => '0.795894',
                      'load' => '2.08',
                      'nusers' => '0',
                      'live' => 1,
                      'uptime' => '90.5272106481481',
                      'b_io' => '1.234498',
                      'b_cpu' => '0.518359',
                      'b_all' => '2.548751',
                      'date' => '22:28:19',
                      'host' => '1of8',
                      'nrunning' => 3,
                      'mhz' => 2792
                    },
          ...


DESCRIPTION

This is part of the clusterpunch system. Each node that you wish to monitor must be running the clusterpunch.shutdown daemon, documented in the clusterpunch.shutdown. Once the server is running, you can poll it and send commands using the API documented in clusterpunchclient.pm, or through the various utilities, such as this one.


SEE ALSO


Daemons

clusterpunchserver, clusterpunch.start, clusterpunch.shutdown


API

clusterpunch.pm


Utilities

benchdriver, clusterbench, clusterlogin, clusternodecount, clustersnapshot


CHANGES


AUTHOR

Martin Krzywinski (martink@bcgsc.ca), January 2003

$Id: clustersnapshot,v 1.8 2003/02/03 21:13:16 martink Exp $