hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Spreitzer <mspre...@us.ibm.com>
Subject Re: Comparing BSP and MR
Date Fri, 09 Dec 2011 19:02:19 GMT
It looks to me like the original questions, and the ensuing discussion, 
are a mix of two concerns.  (1) BSP has some abstractions that have been 
realized in many concrete systems, and these abstractions can be compared 
to MapReduce independently of the details of particular implementations. 
(2) is asking for details of the implementations in Giraph and Hama.  I 
think it is very valuable to separate these.  Even for concrete systems 
like Hama and Giraph, the APIs are sufficiently abstract to allow quite a 
variety of implementations.

Regarding the comparison of abstractions, the first thing to note is that 
MapReduce omits two important concepts from BSP, namely iteration and 
per-component state.  To make a very meaningful comparison, you have to 
compare BSP with an iterated use of MapReduce.  Imagine running the same 
MapReduce job over and over again, with the output of one iteration being 
the input to the next.  In this case you can pair up each reduce task 
(except those of the final iteration) of one iteration with a map task of 
the following iteration --- the one that reads the reduce task's output. 
It is that pair that corresponds to a "compute" invocation in BSP.  Note 
that in iterated MapReduce there are two synchronization barriers per 
iteration: one between map and reduce, and one between iterations.  BSP 
has just one barrier per iteration.  Iterated MapReduce externalizes the 
intermediate data that flows from one iteration's reduce tasks to the next 
iteration's map tasks; in the corresponding BSP computation this 
intermediate data is just held inside the compute invocations.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message