hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: [ANNOUNCEMENT] A query system for BSP processing
Date Fri, 24 Aug 2012 05:33:24 GMT
Hi Leonidas!

I have to admit that I have known what is going on (and had to keep
silent), but I have to say: Thank you very much!
This will help many people writing BSPs in a more easier way.

Of course this is not as fast as the native BSP code, Hive and Pig suffer
from the same problems in MR.
But it gives people the opportunity to develop faster and get their code in
production with just a minor time expense.

And I think, that we will help you gladly on improving the BSP part of your
framework. At least I would do ;)

Thanks!

2012/8/24 Edward J. Yoon <edwardyoon@apache.org>

> Here's my few test results on Oracle BDA (40G/s infiniband network).
> It seems slow than our PageRank example.
>
> P.S., There are some errors so I couldn't test large-scale.
> (java.lang.ClassCastException: hadoop.mrql.MR_int cannot be cast to
> hadoop.mrql.Inv and java.lang.Error: Cannot clear a non-materialized
> sequence ..., etc.)
>
>
>
> == 100K nodes and 1M edges ==
>
> *** Using 10 BSP tasks (out of a max 10). Each task will handle about
> 2383611 bytes of input data.
>
> Run time: 30.384 secs
>
> *** Using 20 BSP tasks (out of a max 20). Each task will handle about
> 1191805 bytes of input data.
>
> Run time: 24.412 secs
>
> On Fri, Aug 24, 2012 at 9:36 AM, Edward J. Yoon <edwardyoon@apache.org>
> wrote:
> > Wow, very interesting. I'm going to install and test on my large cluster.
> >
> > On Fri, Aug 24, 2012 at 4:41 AM, Leonidas Fegaras <fegaras@cse.uta.edu>
> wrote:
> >> Dear Hama users,
> >> I am pleased to announce that the MRQL query processing system can now
> >> evaluate SQL-like queries on a Hama cluster. MRQL is available at:
> >>
> >> http://lambda.uta.edu/mrql/
> >>
> >> MRQL (the Map-Reduce Query Language) is an SQL-like query language for
> >> large-scale, distributed data analysis. MRQL is powerful enough to
> >> express most common data analysis tasks over many different kinds of
> >> raw data, including hierarchical data and nested collections, such as
> >> XML data. MRQL can run in two modes: in MR (Map-Reduce) mode using
> >> Apache Hadoop and in BSP (Bulk Synchronous Parallel) mode using Apache
> >> Hama. Both modes use Apache's HDFS to read and write their data.
> >>
> >> Note that, the BSP mode is currently experimental (not fine-tuned yet)
> >> and lacks any fault-tolerance (if an error occurs, the entire job must
> >> be restarted). Due to our limited resources, MRQL has only been tested
> >> on a small cluster (7-nodes/28-cores). We compared the BSP mode with
> >> the MR mode by evaluating a pagerank query over a small graph (100K
> >> nodes, 1M edges) and found that BSP mode is about 4.5 times faster
> >> than the MR mode. Please let me know if you'd like to contribute to
> >> this project by testing MRQL on a larger cluster.
> >> Best regards,
> >> Leonidas Fegaras
> >> University of Texas at Arlington
> >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message