hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Expressing MapReduce with BSP
Date Thu, 05 Jul 2012 07:19:03 GMT
Hi Apurv,

cool implementation. Also solves the problem the normal wordcount example
has by emitting every word with frequency 1 (large communication overhead
between map and reduce stage).
I would use the Guava MultiMap instead of the Java HashMap because it has
the cool count and auto increment feature.

Why the overhead of merging and sorting for yourself? You could use the
sorted message queue in Hama 0.5.0, this isn't disk based so you will not
have that scalability that you want to target but drastically reduce the
complexity of your code.
If you are working on it anyways, you could create a disk based sorted
queue which does this merging of the messages implicitly.

2012/7/5 Praveen Sripati <praveensripati@gmail.com>

> Apurv,
>
> Not sure of you have seen this paper or not,  but it concludes that
> effectively all MR jobs can be expressed as BSP jobs and other way. It also
> mentions when to go for BSP vs MR.
>
> http://arxiv.org/abs/1203.2081
>
> Thanks,
> Praveen
>
>
> On Thu, Jul 5, 2012 at 1:43 AM, Apurv Verma <dapurv5@gmail.com> wrote:
>
> > Hello,
> >  Here is a simplistic WordCount example I wrote with hama. There are a
> few
> > TODOs left but it works fine, Its fully scalable when all TODOs are
> > complete.
> >
> >
> >
> http://code.google.com/p/anahad/source/browse/trunk/src/main/java/org/anahata/bsp/WordCount.java
> >
> > Comments welcome :)
> >
> > --
> > thanks and regards,
> >
> > Apurv Verma
> > India
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message