Hi all

I vary the computational nodes of cluster and get the speedup result in attachment.

In my mind, there are three type of speedup model: linear, sub-linear and super-linear. However the curve of my result seems a little strange. I have attached it.
ǶͼƬ 2

This is sort in example.jar, actually it is done only using the default map-reduce mechanism of Hadoop.

I use hadoop-1.2.1, set 8 map slots and 8 reduce slots per node(12 cpu, 20g men)
 io.sort.mb = 512, block size = 512mb, heap size = 1024mb,  reduce.slowstart = 0.05, the others are default.

Input data: 20g, I divide it to 64 files

Sort example: 64 map tasks, 64 reduce tasks

Computational nodes: varying from 2 to 9

Why the speedup mechanism is like this? How can I model it properly?

Thanks

--
Sincerely,
Zhaojie