>
> So a map task in MR corresponds to a computation phase in a superstep. Once
> the computation phase for a superstep is complete, the vertex output is
> stored using the defined OutputFormat, the message sent (may be) to another
> vertex and the map task is stopped. Once the barrier synchronization phase
> is complete, another set of map tasks are invoked for the vertices which
> have received a message.
>
Consult giraph for this purpose, we don't provide this functionality.
What happens if a particular node is lost in case of Hama and Giraph? Are
> the messages not persisted somewhere to be fetched later.
>
There is a checkpointer after each superstep that is materializing messages
to HDFS.
It's being the done other way, BSP is implemented in Giraph using Hadoop.
>
Yea, because Google released the MapReduce paper years before the Pregel
paper.
I would have wondered how things had turned arround for the other way.
2011/12/9 Praveen Sripati <praveensripati@gmail.com>
> Thanks to Thomas and Avery for the response.
>
> > For Giraph you are quite correct, all the stuff is submitted as a MR job.
> But a full map stage is not a superstep, the whole computation is a done in
> one mapping phase.
>
> So a map task in MR corresponds to a computation phase in a superstep. Once
> the computation phase for a superstep is complete, the vertex output is
> stored using the defined OutputFormat, the message sent (may be) to another
> vertex and the map task is stopped. Once the barrier synchronization phase
> is complete, another set of map tasks are invoked for the vertices which
> have received a message.
>
> In a regular MR Job (not Giraph) the number of Map tasks equals to the
> number of InputSplits. But, in case of Giraph the total number of maps to
> be launched is usually more than the number of input vertices.
>
> Please let me know if I am correct.
>
> > Where are the incoming, outgoing messages and state stored
> > Memory
>
> What happens if a particular node is lost in case of Hama and Giraph? Are
> the messages not persisted somewhere to be fetched later.
>
> > In Giraph, vertices can move around workers between supersteps. A vertex
> will run on the worker that it is assigned to.
>
> Is data locality considered while moving vertices around workers in Giraph?
>
> > As you can see, you could write a MapReduce Engine with BSP on top of
> Apache Hama.
>
> It's being the done other way, BSP is implemented in Giraph using Hadoop.
>
> Praveen
>
> On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching <aching@apache.org> wrote:
>
> > Hi Praveen,
> >
> > Answers inline. Hope that helps!
> >
> > Avery
> >
> > On 12/8/11 10:16 PM, Praveen Sripati wrote:
> >
> > Hi,
> >
> > I know about MapReduce/Hadoop and trying to get myself around
> > BSP/Hama-Giraph by comparing MR and BSP.
> >
> > - Map Phase in MR is similar to Computation Phase in BSP. BSP allows for
> > process to exchange data in the communication phase, but there is no
> > communication between the mappers in the Map Phase. Though the data flows
> > from Map tasks to Reducer tasks. Please correct me if I am wrong. Any
> other
> > significant differences?
> >
> > I suppose you can think of it that way. I like to compare a BSP
> superstep
> > to a MapReduce job since it's computation and communication.
> >
> > - After going through the documentation for Hama and Giraph, noticed that
> > they both use Hadoop as the underlying framework. In both Hama and Giraph
> > an MR Job is submitted. Does each superstep in BSP correspond to a Job in
> > MR? Where are the incoming, outgoing messages and state stored - HDFS or
> > HBase or Local or pluggable?
> >
> > My understanding of Hama is that they have their own BSP framework.
> > Giraph can be run on a Hadoop installation, it does not have its own
> > computational framework. A Giraph job is submitted to a Hadoop
> > installation as a Map-only job. Hama will have its own BSP lauching
> > framework.
> >
> > In Giraph, the state is stored all in memory. Graphs are loaded/stored
> > through VertexInputFormat/VertexOutputFormat (very similar to Hadoop).
> You
> > could implement your own VertexInputFormat/VertexOutputFormat to use
> HDFS,
> > HBase, etc. as your graph stable storage.
> >
> > - If a Vertex is deactivated and again activated after receiving a
> > message, does is run on the same node or a different node in the cluster?
> >
> > In Giraph, vertices can move around workers between supersteps. A
> vertex
> > will run on the worker that it is assigned to.
> >
> > Regards,
> > Praveen
> >
> >
> >
>
--
Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>
|