Documentation—clusterpunch.conf
NAME
clusterpunch.conf - configuration file for clusterpunch
SYNOPSIS
# general parameters param1 = value param2 = value
# sort methods - multiple blocks <sort> statistic = value sort = ascending | descening format = value </sort> ... <sort> statistic = value sort = ascending | descening format = value </sort>
# punches - multiple blocks <punch> name = value statistic = value cumulative = value valuemap = function valuetype = timer | return appendargs = true format = value sort = ascending | descending function << CODE ...perl code here... CODE </punch>
... more punches here
DESCRIPTION
Reading Configuration Files
By default, all clusterpunch daemon and utility programs look for a configuration file in the following locations
~/.clusterpunch ../etc/clusterpunch.conf (relative to location of binary) /usr/local/etc/clusterpunch.conf /etc/clusterpunch.conf
An attempt is made to read all files. If a parameter is defined in /etc/clusterpunch.conf, it will override the same parameter defined in other files. Parameters defined in blocks (sorts and punches) are overwritten, but appended to the global list of sorts and punches. Thus, it is possible to define a core network list of punches in /usr/local/etc/clusterpunch.conf and then define host-specific punches in /etc/clusterpunch.conf
To read in a configuration file use the -f configfile flag for all daemon and utility scripts. For example,
clusterbench -f /path/to/file.conf
If a custom file is found, no other files are read.
General Parameters
General parameters define how clusterpunch works across the network. Each parameter is defined using
parameter = value
The following parameters are possible
- logdir = /path/to/logidr
-
Directory to which each clusterpunchserver will write its logfiles to. This directory must exist and have the right permissions.
- logging = true | false
-
Toggles logging.
- verbose = true | false
-
Controls presence of STDOUT messages produced by clusterpunchserver. If the server is running in the background, this value has no effect.
- daemon = true | false
-
Controls whether the server is started in the background by default
- debug = true | false
-
Controls debug output
- port = PORT
-
Specifies the UDP port to listen to and to send punches over
- broadcast = BROADCAST_ADDRESS
-
Specifies the UDP broadcast address (e.g. 10.1.2.255) to use to announce punches
- timeout = TIMEOUT
-
Specifies the number of seconds to wait for responses from servers before client utilities stop listening.
Sort Types
In order to sort values by the results of punches, it's necessary to store how each punch should be treated. For example, sometimes high values are desireable (MHz) and sometimes low values are better (benchmark times). Generally, the ways a value should be treated are controlled using
sort = ascending | descending alphasort = true | false
There are two punches which are sent to the server by default. These are the 'host' and the 'live' punch. The 'host' punch simply retrieves the host name and the 'live' punch simply asks the host for a 1. Since these punches are not defined in the <punch> blocks, their sorts are defined in the <sort> blocks.
In addition, any cumulative statistics that you define in the <punch> blocks should have their sorts defined in the <sort> block. A <sort> block looks like
<sort> statistic = live sort = ascending format = %4d </sort>
The parameters understood by this block are
- statistic = VALUE
-
The VALUE is the name of the punch or statistic for which the sort applies.
- sort = ascending | descending
-
By default, sorts are ascending. If you want an ascending sort, define it with 'sort = ascending'.
- format = FORMAT
-
The way the results are displayed are controlled by the printf-like FORMAT
- alphasort = true | false
-
If you want the sort to be asciibetical, set alphasort to be true. The default is a numerical sort.
Introduction to Punches
The flexibility in clusterpunch lies in its ability to accept user-defined punches. You can define punches by specifying the Perl code, or specifying an external binary to run. The latter case is useful if you have your own benchmark tools.
A punch is defined in a <punch> block like this
<punch> parameter = value parameter = value ... </punch>
A punch can be thought of like ... well, a punch. Think of a sweaty boxer. The idea is that you 'punch' your cluster nodes and see how 'quickly they get up'. Ok, enough with the metaphors. The punch is a mini-benchmark. It's supposed to be mini- so that you don't use up your CPU cycles only for benchmarks and so that you minimally affect other running jobs.
Punches can return values, like diagnostic punches. These are nothing more than requests for information. For example, a punch might ask the node to return its total MHz rating. The benchmark punches, on the other hand, are meant to be timed. They don't return any values and the sole purpose of running them is to see how responsive your nodes are at any time.
Sample Punches
Here is a simple punch
<punch> name = punch1 statistic = bench1 valuetype = timer format = %6.2f function <<CODE for (my $i=0;$i<1e6;$i++) { rand () } CODE </punch>
This punch is called 'punch1'. The name specifies the function which is used to call the punch from utility scripts. If you wanted to have the nodes execute this punch you would use
-c "punch1"
The 'statistic', on the other hand, is the label of the data value
returned. Since 'valuetype' is set to timer, the return value will be the
time taken to run this punch. The code to run is defined by way of a
here-document with 'function' as the name of the parameter. The 'format'
parameter defines the way the punch value will be formatted on output to
the screen. In this punch, rand()
is called 1,000,000 times.
For example, running
> clustersnapshot -c "punch1"
asks all the nodes to run this punch and returns
host bench1 live 0of8 0.406 1 5of8 0.407 1 1of7 0.407 1 3of8 0.407 1 5of7 0.407 1 9of7 0.407 1 ...
Notice that the return values are formatted with %6.2f and they are sorted with a descending numerical sort (default sort type). The 'host' and 'live' punch results are displayed, since these are default punches that are always carried out.
Within the Perl code, you can accept input parameters using something like this
function <<CODE my ($x,$y) = @_; $x || = 5; $y || = 10; .... CODE
and then call the punch with
-c "punchname(10,15)"
so that x=10 and y=15. In the code definition above, (x,y) get the default values (5,10) if they are not defined or zero.
You can define punches which make a system call using a punch like this
<punch> name = punch2 statistic = cat valuetype = timer system = "/bin/cat /home/martink/work/glimpse/mail/.glimpse_index | wc" </punch>
The command defined by 'system' will be called and timed.
Punches in Detail
The following parameters are supported within punch blocks.
- name = NAME
-
This defines the name of the punch. The name must be unique and is used to call the punch from commands passed to utilities. If you have punch named bob, you can call it with
-c "bob(arg,arg);jerry"
with arguments as shown. If you also have a jerry punch defined, separate them using ;. If bob does not take arguments, use -c ``bob''.
- namegroup = NAME1,NAME2,NAME3,...
-
If you would like to split the punch value into multiple statistics, use namegroup and valuedelim. The namegroup parameter defines the names of the statistics which are populated with the punch_value, split along valuedelim. For example, if the punch return value is ``1:2:3'' then if
namegroup = val1,val2,val3 valuedelim = : the following statistics will be populated with values
val1 = 1 val2 = 2 val3 = 3
If you have more values than statistics, any unassigned values will be discarded. The namegroup/valuedelim feature is useful when calculating a list of values is just as fast as calculating a single value. For example, using a load monitor tool like 'atop', or 'atsar' to calculate CPU utilization, you need to spend, for example, 2 seconds to measure CPU sys/idle/nice/user utilization. However, at the end of 2 seconds you get all four values which really belong in separate statistics. If you were to calculate each one independently, you'd need 8 seconds.
- statistic = STATISTIC
-
The statistic is the key in the response hash populated by the value of the punch. The statistic will show up in the table produced by clustersnapshot. The statistic is also used to sort the table.
- cumulative = CUMULATIVE
-
Sometimes you want to create virtual statistics, which are sums of other punches. For example, the benchmarks for memory, I/O and CPU subsystems have their own statistics (b_mem, b_io, b_cpu) but also add their values to b_all. Since the cumulative statistic is not defined as a punch statistic, you have to define its sort methods in a <sort> block (see above).
You can have multiple cumulative statistics, but only one per punch. Cumulative statistics are not implemented for punches which use namegroup.
- valuemap = FUNCTION
-
This is a fun parameter. You can map the punch value to another value using a function. For example, suppose your punch returns (or is timed) and the value is 10. If you want to return the log of this value, use
valuemap = return log($_[0])
Within the FUNCTION, the punch value is available as $_[0].
- valuetype = timer | return
-
If you want the punch timed, use 'timer'. If you want to use the return value of the punch code to be used as the punch value, use 'return'. Typically, benchmark punches are timed and diagnostic punches return values. Whatever the value, timed or returned, you can modify it with 'valuemap' described above.
- valuedelim = STRING
-
If your punch returns a list of values concatentated with STRING, you can use namegroup to store these values in different statistics. Specify the delimiter to use to split the punch return string with 'valuedelim'.
- format = FORMAT
-
The value of the punch will be formatted using FORMAT, which is expected to have a printf-type syntax.
- sort = ascending | descending
-
Associate a sort order with the punch statistic. By default, all sorts are ascending. If you want an ascending sort, you don't need to specify the sort type.
- alphasort = true | false
-
By default all sorts are numerical. If you want to sort by asciibetical order, use 'alphasort = true'.
- appendargs = true | false
-
You can append the arguments passed to a punch to its statistic by setting appendargs to true (default is false). This is useful when you are calling the same punch with different parameters in the same call. For example,
load()
is a punch which can take parameters to sample 1 min, 5 min and 15 min loads. The load punch statistic is 'load'. If you do not append the arguments to the statistic name, you'll overwrite the values and get the load value from the last load punch.bin/clustersnapshot -c "load;load(5);load(15)" host live load load15 load5 0of0 1 0.1 0.18 0.20 0of1 1 0.0 0.00 0.00
- function << CODE ... CODE
-
Using a here-document syntax, you can define the Perl code to be executed in a punch. The code must be free of syntax errors! Please use the benchdriver utility to test your punches. Within the code you have the option of calling a logging function.
... Log("message to log"); ...
If you want to function to accept arguments, please use @_.
name = argpunch function <<CODE my ($x,$y) = @_; ... CODE then call your punch with
-c "argpunch(10,20)"
- system = COMMAND
-
Instead of executing Perl code, you can make a call to the system COMMAND. This is useful if you have your own binary which you want to time.
valuetype = timer system = "/bin/specialbench -param 10 &> /dev/hull"
If the punch 'valuetype' is specified as 'timer' then the call to the binary will be automatically timed. If you specify 'return', however, you'll get back whatever the binary output to STDOUT. Don't pipe to /dev/null in that case!
valuetype = return system = "who | wc -c"
Any leading spaces in the all output lines will be stripped. New lines will be preserved, except for the last new line.
SEE ALSO
Daemons
clusterpunchserver, clusterpunch.start, clusterpunch.shutdown
API
clusterpunch.pm
Utilities
benchdriver, clusterbench, clusterlogin, clusternodecount, clustersnapshot
CHANGES
AUTHOR
Martin Krzywinski (martink@bcgsc.ca) January 2003
$Id: clusterpunch.conf,v 1.6 2003/02/04 00:18:34 martink Exp $