Sample Benchmarks
Periodically, I'll update these benchmark times as more nodes become available
Sample Benchmark
Below is the profile of our IBM x330 cluster at the Genome Sciences Centre. The profile was captured when the cluster was idle. The CPU, IO and memory benchmarks are histogramed below the table, showing the distribution of times across cluster nodes.
clustersnapshot -c "benchio(80e3,2);benchcpu(5e5);
benchmem(70,1e6);mhz;load(1)" -t 60
host b_all b_cpu b_io b_mem live load mhz
0of1 8.427 3.618 1.622 3.187 1 0.2 1992
8of0 8.482 3.584 1.623 3.275 1 0.1 1992
3of2 8.719 3.864 1.592 3.264 1 0.1 1992
9of0 8.741 3.856 1.624 3.262 1 0.1 1992
4of0 8.742 3.856 1.621 3.264 1 0.1 1992
2of0 8.746 3.861 1.616 3.269 1 0.2 1992
2of2 8.750 3.864 1.621 3.265 1 0.2 1992
9of2 8.755 3.856 1.632 3.266 1 0.2 1992
3of0 8.757 3.856 1.635 3.266 1 0.2 1992
7of0 8.763 3.861 1.629 3.273 1 0.2 1992
6of2 8.766 3.867 1.626 3.272 1 0.1 1992
5of0 8.768 3.856 1.647 3.266 1 0.2 1992
1of3 8.778 3.875 1.638 3.265 1 0.2 1992
0of3 8.787 3.879 1.635 3.273 1 0.2 1992
2of1 8.800 3.867 1.650 3.283 1 0.2 1992
5of8 8.955 2.772 3.363 2.820 1 0.2 2792
6of8 8.957 2.778 3.300 2.880 1 0.2 2792
1of8 8.958 2.774 3.356 2.828 1 0.1 2792
7of7 8.979 2.776 3.331 2.872 1 0.1 2792
2of7 9.011 2.776 3.352 2.883 1 0.2 2792
2of8 9.015 2.774 3.357 2.884 1 0.2 2792
7of8 9.024 2.774 3.370 2.880 1 0.2 2792
1of7 9.031 2.797 3.350 2.884 1 0.2 2792
0of8 9.037 2.798 3.356 2.882 1 0.2 2792
4of7 9.078 2.794 3.401 2.883 1 0.2 2792
8of7 9.084 2.799 3.397 2.889 1 0.2 2792
8of8 9.088 2.761 3.443 2.884 1 0.2 2792
6of7 9.092 2.772 3.451 2.869 1 0.1 2792
5of7 9.100 2.797 3.420 2.883 1 0.2 2792
9of7 9.117 2.799 3.431 2.887 1 0.2 2792
3of7 9.161 2.797 3.479 2.884 1 0.2 2792
4of8 9.593 2.795 3.912 2.886 1 0.1 2792
6of4 9.688 3.063 3.637 2.988 1 0.2 2522
0of5 9.697 3.109 3.613 2.975 1 0.2 2522
1of4 9.708 3.074 3.650 2.984 1 0.2 2522
7of3 9.738 3.065 3.689 2.984 1 0.2 2522
0of0 9.740 3.875 2.600 3.265 1 0.2 1992
3of4 9.744 3.053 3.708 2.982 1 0.2 2522
4of3 9.745 3.060 3.700 2.984 1 0.2 2522
0of4 9.759 3.075 3.699 2.985 1 0.2 2522
8of4 9.767 3.067 3.714 2.986 1 0.2 2522
9of3 9.769 3.075 3.711 2.983 1 0.2 2522
3of8 9.781 2.799 4.097 2.885 1 0.2 2792
1of5 9.782 3.111 3.686 2.985 1 0.2 2522
9of4 9.792 3.110 3.697 2.986 1 0.2 2522
5of3 9.802 3.060 3.757 2.985 1 0.2 2522
5of4 9.857 3.065 3.809 2.984 1 0.3 2522
4of4 9.866 3.057 3.827 2.982 1 0.2 2522
2of4 10.267 3.073 4.210 2.984 1 0.2 2522
3of3 10.422 3.060 4.378 2.984 1 0.3 2522
6of3 11.128 3.060 5.216 2.852 1 0.2 2522
8of3 11.216 3.072 5.163 2.981 1 0.2 2522
7of2 17.376 3.864 10.246 3.267 1 0.4 1992
6of0 17.526 3.856 10.404 3.266 1 0.4 1992
4of2 17.623 3.869 10.485 3.269 1 0.5 1992
6of1 17.772 3.865 10.632 3.274 1 0.4 1992
3of1 17.904 3.867 10.772 3.266 1 0.4 1992
1of2 18.034 3.864 10.907 3.263 1 0.4 1992
1of1 18.038 3.866 10.901 3.271 1 0.4 1992
5of2 18.117 3.864 10.990 3.264 1 0.5 1992
4of1 18.126 3.862 10.999 3.265 1 0.5 1992
8of1 18.133 3.867 10.996 3.269 1 0.3 1992
8of2 18.180 3.864 11.052 3.264 1 0.4 1992
9of1 18.198 3.868 11.061 3.270 1 0.5 1992
5of1 18.727 3.864 11.600 3.264 1 0.4 1992
0of2 18.761 3.866 11.625 3.269 1 0.5 1992
TOTAL 736.845 220.840 312.713 203.293 66 14.8 155412
Benchmark Distribution
Benchmark Distribution. Times for CPU, IO, memory and combined (CPU+IO+memory) benchmarks are histogramed in this figure. The trimodal distributions illustrate that the cluster contains nodes which differentiate into three distinct groups. The nodes in fact come in three different speeds: 2x1GHz, 2x1.26GHz and 2x1.4GHz (P3 processors).
The relationship between the node speed an each of the benchmark times is shown below. All nodes are based on the IBM x330 platform, with dual CPUs and on-board SCSI-160 disk controller. All nodes have a single 18Gb disk. The main difference are the CPUs, which come in flavours of 1 GHz, 1.26 GHz and 1.4 GHz.
CPU Benchmark Profile. The CPU benchmark makes 500,000 calls to rand(). This benchmark scales linearly with CPU speed and is a reasonable measure of CPU speed.
|
Memory Benchmark Profile. The memory benchmark allocates/deallocates a 1,000,000 element array 70 times. This particularly benchmark appears to scale linearly with CPU speed, much like the CPU benchmark.
|
IO Benchmark Profile. The IO benchmark writes 2 80Mb files to /tmp. A group of nodes (2x1GHz) appears to have very fast I/O using the same disks as nodes with slower I/O times. The effect of the low benchmark times is not known.
|
Combined Benchmark Profile. The combined time shown is the sum of the three other benchmarks.
|