I vary the computational nodes of cluster and get the speedup result in attachment.
In my mind, there are three type of speedup model: linear, sub-linear and super-linear. However the curve of my result seems a little strange. I have attached it.
This is sort in example.jar, actually it is done only using the default map-reduce mechanism of Hadoop.
I use hadoop-1.2.1, set 8 map slots and 8 reduce slots per node(12 cpu, 20g men)
io.sort.mb = 512, block size = 512mb, heap size = 1024mb, reduce.slowstart = 0.05, the others are default.
Input data: 20g, I divide it to 64 files
Sort example: 64 map tasks, 64 reduce tasks
Computational nodes: varying from 2 to 9
Why the speedup mechanism is like this? How can I model it properly?