drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asaf Mesika <asaf.mes...@gmail.com>
Subject Shuffle operator in Apache Drill / Impala / Hive / Shark
Date Fri, 09 May 2014 09:18:25 GMT

I'm researching the different flavors of MPPs and had trouble understanding
Shuffle differences between Apache Drill, Impala and Hive (essentially
M/R), and thought you guys might be able to help me a bit.

Suppose you are doing a classic SELECT ip, count(*) FROM ip_events GROUP BY
ip query.
In Hive, the mapper task will do the local group by (combiner) on a
specific hdfs block (input data), and copy the resulting group by from the
mapper task to a reducer task machine. Once the reducer task has all mapper
grouped by data, it then performs its own group by to yield the final

In Apache Drill (here you need to correct me if I'm wrong) - the data is
read from hdfs block on its data-node, grouped by locally and once its
complete (spilled to disk if can't fit in RAM), the records are transmitted
(exchange operator) to the node going the final group by. This node is not
waiting for "mapper" nodes to wrap up, but starts aggregating as soon as it
receives the records (spill to disk if data can't fit in RAM).

In this approach, failure recovery in case one "mapper" node goes down is
impossible since you already aggregated parts of the data and it can't be
separated, thus re-running this "mapper" task again will skew the results.

In Shark, in contrast, the "mapper" task group by results are transferred
from mapper node to reducer node, but kept in memory (spilled to disk if
needed), and once ALL mapper tasks data has arrived, the final reducer
group by aggregation begins.
This approach allows for failure recovery - if one mapper node drops dead
mid-query you can re-run it on another mapper node, and continue.

So my questions are:
1. Is the above correct?
2. Does Impala works like Apache Drill as I described it above?

Thanks alot !


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message