hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: [ANNOUNCEMENT] A query system for BSP processing
Date Fri, 24 Aug 2012 05:17:08 GMT
Here's my few test results on Oracle BDA (40G/s infiniband network).
It seems slow than our PageRank example.

P.S., There are some errors so I couldn't test large-scale.
(java.lang.ClassCastException: hadoop.mrql.MR_int cannot be cast to
hadoop.mrql.Inv and java.lang.Error: Cannot clear a non-materialized
sequence ..., etc.)



== 100K nodes and 1M edges ==

*** Using 10 BSP tasks (out of a max 10). Each task will handle about
2383611 bytes of input data.

Run time: 30.384 secs

*** Using 20 BSP tasks (out of a max 20). Each task will handle about
1191805 bytes of input data.

Run time: 24.412 secs

On Fri, Aug 24, 2012 at 9:36 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
> Wow, very interesting. I'm going to install and test on my large cluster.
>
> On Fri, Aug 24, 2012 at 4:41 AM, Leonidas Fegaras <fegaras@cse.uta.edu> wrote:
>> Dear Hama users,
>> I am pleased to announce that the MRQL query processing system can now
>> evaluate SQL-like queries on a Hama cluster. MRQL is available at:
>>
>> http://lambda.uta.edu/mrql/
>>
>> MRQL (the Map-Reduce Query Language) is an SQL-like query language for
>> large-scale, distributed data analysis. MRQL is powerful enough to
>> express most common data analysis tasks over many different kinds of
>> raw data, including hierarchical data and nested collections, such as
>> XML data. MRQL can run in two modes: in MR (Map-Reduce) mode using
>> Apache Hadoop and in BSP (Bulk Synchronous Parallel) mode using Apache
>> Hama. Both modes use Apache's HDFS to read and write their data.
>>
>> Note that, the BSP mode is currently experimental (not fine-tuned yet)
>> and lacks any fault-tolerance (if an error occurs, the entire job must
>> be restarted). Due to our limited resources, MRQL has only been tested
>> on a small cluster (7-nodes/28-cores). We compared the BSP mode with
>> the MR mode by evaluating a pagerank query over a small graph (100K
>> nodes, 1M edges) and found that BSP mode is about 4.5 times faster
>> than the MR mode. Please let me know if you'd like to contribute to
>> this project by testing MRQL on a larger cluster.
>> Best regards,
>> Leonidas Fegaras
>> University of Texas at Arlington
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message