hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonidas Fegaras <fega...@cse.uta.edu>
Subject [ANNOUNCEMENT] A query system for BSP processing
Date Thu, 23 Aug 2012 19:41:47 GMT
Dear Hama users,
I am pleased to announce that the MRQL query processing system can now
evaluate SQL-like queries on a Hama cluster. MRQL is available at:

http://lambda.uta.edu/mrql/

MRQL (the Map-Reduce Query Language) is an SQL-like query language for
large-scale, distributed data analysis. MRQL is powerful enough to
express most common data analysis tasks over many different kinds of
raw data, including hierarchical data and nested collections, such as
XML data. MRQL can run in two modes: in MR (Map-Reduce) mode using
Apache Hadoop and in BSP (Bulk Synchronous Parallel) mode using Apache
Hama. Both modes use Apache's HDFS to read and write their data.

Note that, the BSP mode is currently experimental (not fine-tuned yet)
and lacks any fault-tolerance (if an error occurs, the entire job must
be restarted). Due to our limited resources, MRQL has only been tested
on a small cluster (7-nodes/28-cores). We compared the BSP mode with
the MR mode by evaluating a pagerank query over a small graph (100K
nodes, 1M edges) and found that BSP mode is about 4.5 times faster
than the MR mode. Please let me know if you'd like to contribute to
this project by testing MRQL on a larger cluster.
Best regards,
Leonidas Fegaras
University of Texas at Arlington


Mime
View raw message