drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Drill: Memory Spilling for the Hash Aggregate Operator
Date Tue, 17 Jan 2017 00:09:34 GMT
Does the data need to be written into a disk-friendly format when a partition is selected to
be written to disk? If you are careful in your choice of format then it doesn’t need to
be re-written. And in fact you can start with the assumption that everything is going to disk.

One of the most beautiful innovations of the HHJ paper [1] is the histogram-driven partitioning.
Basically, during phase 1 you apply the phase 2 hash function to assign rows to “sub-partitions”.
Partition 0 would contain sub-partitions 0.0, 0.1, … 0.n; partition 1 would contain sub-partitions
1.1, …, 1.n. The rows are all mixed together in each partition, but you know how many rows
(and bytes) are in each sub-partition. If partition 0 (or any partition) ends up larger than
memory then you are going to need a phase 3. But you can enter phase 2 armed with some very
useful knowledge. You know the sizes of the sub-partitions and you can choose a hash function
in phase 2 such that many of the partitions end up *just* smaller than memory.

The big problem with external sort and hash algorithms is the huge performance hit when you
require an extra phase. If you need 2 phases, HHJ can convert that to 1.5 phases (by pulling
smaller partitions back into memory) and by optimizing the assignment of rows to partitions
it can turn a 3.1 phase query into a 2.9 phase query - a big win.


[1] https://pdfs.semanticscholar.org/fc1c/78cbef5062cf49fdb309b1935af08b759d2d.pdf <https://pdfs.semanticscholar.org/fc1c/78cbef5062cf49fdb309b1935af08b759d2d.pdf>

> On Jan 14, 2017, at 7:34 AM, Boaz Ben-Zvi <bben-zvi@mapr.com> wrote:
>  Sorry for no attachment (Apache mail rules) -- Here is a link to the document:
> DrillSpillmemoryforHashAggregation.pdf - https://drive.google.com/file/d/0ByUg32jfEW16ajNiQlVRczhPTjA/view?usp=sharing
> [https://lh3.googleusercontent.com/U9FNbWEBljT-HDRBE1-vhMnE4Ug5YFgutztvbys2UnTiVp-FQX6mzQ=w1200-h630-p]<https://drive.google.com/file/d/0ByUg32jfEW16ajNiQlVRczhPTjA/view?usp=sharing>
> DrillSpillmemoryforHashAggregation.pdf<https://drive.google.com/file/d/0ByUg32jfEW16ajNiQlVRczhPTjA/view?usp=sharing>
> drive.google.com
>    -- Boaz
> ________________________________
> From: Julian Hyde <jhyde@apache.org>
> Sent: Friday, January 13, 2017 11:00 PM
> To: dev@drill.apache.org
> Subject: Re: Drill: Memory Spilling for the Hash Aggregate Operator
> The attachment didn't come through. I'm hoping that you settled on a "hybrid" hash algorithm
that can write to disk, or write to memory, and the cost of discovering that is wrong is not
too great. With Goetz Graefe's hybrid hash join (which can be easily adapted to hybrid hash
aggregate) if the input ALMOST fits in memory you could process most of it in memory, then
revisit the stuff you spilled to disk.
>> On Jan 13, 2017, at 7:46 PM, Boaz Ben-Zvi <bben-zvi@mapr.com> wrote:
>> Hi Drill developers,
>>     Attached is a document describing the design for memory spilling implementation
for the Hash Aggregate operator.
>>     Please send me any comments or questions,
>>        -- Boaz

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message