hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: evaluating Hama
Date Mon, 13 Sep 2010 01:25:39 GMT
I estimate a month or two months for 0.2.0 release.

If input/output system and fault tolerant mechanism are added to BSP
package in the future, the graph specified programming model and
framework will be implemented easily. I guess, we can implement
input/output system and FT mechanism within this year.

> Do you know of any alternate parallel graph processing frameworks similar to
> Pregel & Hama?

Nope, but I was used BSPLib for simulate of Pregel concept. It might
be good solution for you.

> About the data split -- if, for example, there are 10 nodes in the cluster
> and the data be divided into 10 splits (split-1 to split-10), then can we
> control which split goes to which node as local data? In case of MR splits,
> this cannot be controlled isn't it, can we do that here?

I understood your question.

It is depending on {"how to designing the data structure", "how to
storing, organizing and re-using data"} on "somewhere". We don't have
a plan for graph data store yet.

Thanks. :)

On Fri, Sep 10, 2010 at 1:38 PM, Raghava Mutharaju
<m.vijayaraghava@gmail.com> wrote:
> Hello Edward,
>
> Thank you for the reply. Please correct me if I am wrong about what I am
> going to say.
>
> If the BSP computing framework is in place, how much more of a work would it
> be to place a graph processing framework on top of it? I guess some parts of
> the graph processing framework (Angrapa) is in place?
>
> While I was searching for parallel graph processing frameworks, I came
> across Pregel and also Hama :). Pregel development would have taken lot of
> time, Hama is just starting out, so it would be unrealistic to make it as
> robust with as many features as Pregel, but it would be great to have
> something in place to test out my ideas.
>
> When is the release of 0.2.0 scheduled?
>
> Do you know of any alternate parallel graph processing frameworks similar to
> Pregel & Hama?
>
> About the data split -- if, for example, there are 10 nodes in the cluster
> and the data be divided into 10 splits (split-1 to split-10), then can we
> control which split goes to which node as local data? In case of MR splits,
> this cannot be controlled isn't it, can we do that here?
>
> Thank you.
>
> Regards,
> Raghava.
>
> On Thu, Sep 9, 2010 at 10:47 PM, Edward J. Yoon <edwardyoon@apache.org>wrote:
>
>> Hello,
>>
>> > 1) What is the status of the project, specifically the graph processing
>> part
>> > (Angrapa?). Is it sufficiently stable to be used? Although this is an
>> > academic research project, it would be better to work on a stable one.
>>
>> At present, we're focussing on a framework for more general-purpose
>> BSP computing, so yet far from the graph processing framework such as
>> Google Pregel.
>>
>> We have a release plan for 0.2.0 version and we're working on it.The
>> release 0.2.0 will include:
>>
>>  * BSP computing framework (no fault tolerant mechanism, no data
>> input-output API)
>>  * and its examples
>>
>> > 2) I haven't come across any installation/building steps for Hama. How to
>> > integrate with HDFS/HBase?
>>
>> We'll create a input-output system that can be used to process data.
>> You can think it as a M/R computing framework on HDFS/HBase.
>>
>> > 3) Are there more extensive performance tests say w.r.t the latest branch
>> of
>> > development? Do they have better performance?
>>
>> Not yet.
>>
>> > 4) Can the data assigned to each partition (cluster) be split according
>> to
>> > some condition i.e. can it be controlled unlike a MR split?
>>
>> Do you mean, whether it can assign a task to slaves according to other
>> condition (not based on local)? Then, no.
>>
>> The all splits should be loaded and computed locally. Otherwise, it
>> will cause meaningless huge data-copy overhead among servers.
>>
>> Thanks :)
>>
>> On Fri, Sep 10, 2010 at 7:09 AM, Raghava Mutharaju
>> <m.vijayaraghava@gmail.com> wrote:
>> > Hi all,
>> >
>> > I am working on a research project where I faced the issues that formed
>> the
>> > motivation for Hama (Hamburg) -- the splits in the data depend on each
>> other
>> > and data locality issue in case of multiple MR iterations. I was thinking
>> of
>> > checking other alternatives to MR when I came across Hama. I am in the
>> > process of checking whether Hama would fit our project needs and I need
>> your
>> > help in that regard.
>> >
>> > I am interested in the graph processing part of Hama.
>> >
>> > I have the following questions
>> >
>> > 1) What is the status of the project, specifically the graph processing
>> part
>> > (Angrapa?). Is it sufficiently stable to be used? Although this is an
>> > academic research project, it would be better to work on a stable one.
>> > 2) I haven't come across any installation/building steps for Hama. How to
>> > integrate with HDFS/HBase?
>> > 3) Are there more extensive performance tests say w.r.t the latest branch
>> of
>> > development? Do they have better performance?
>> > 4) Can the data assigned to each partition (cluster) be split according
>> to
>> > some condition i.e. can it be controlled unlike a MR split?
>> >
>> > Thank you.
>> >
>> > Regards,
>> > Raghava.
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Mime
View raw message