drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boaz Ben-Zvi <bben-...@mapr.com>
Subject Re: Drill: Memory Spilling for the Hash Aggregate Operator
Date Mon, 16 Jan 2017 21:10:09 GMT
  Hi Julian,

      Yes, the model used for the memory spilling is the “hybrid” one — the memory space
is pre-divided into “partitions” (based on the hash values of the key columns), and then
as needed only some of the partitions would end up spilled to secondary storage (e.g. HDD),
hence after all the input is read, some of the partitions would still be fully in memory (the
term chosen for those was “pristine”); hence for the data in those partitions there would
be no extra overhead. The remaining partitions would incur the needed IO overhead.
   In the worst case all the partitions would end up spilled; in a typical case only some
of them; whenever the memory limit is reached — the decision on which partition to spill
next is a delicate one — choosing a “pristine” partition would increase the total overhead,
vs. choosing a previously spilled partition which yields little memory (and increases disk
seeks later).
   All this is explained in the document; hope it is accessible now.

       And the same model should be used for the Hash Join later on.

               — Boaz

> On Jan 13, 2017, at 11:00 PM, Julian Hyde <jhyde@apache.org> wrote:
> 
> The attachment didn’t come through. I’m hoping that you settled on a “hybrid”
hash algorithm that can write to disk, or write to memory, and the cost of discovering that
is wrong is not too great. With Goetz Graefe’s hybrid hash join (which can be easily adapted
to hybrid hash aggregate) if the input ALMOST fits in memory you could process most of it
in memory, then revisit the stuff you spilled to disk.
> 
>> On Jan 13, 2017, at 7:46 PM, Boaz Ben-Zvi <bben-zvi@mapr.com> wrote:
>> 
>> Hi Drill developers,
>> 
>>     Attached is a document describing the design for memory spilling implementation
for the Hash Aggregate operator.
>> 
>>     Please send me any comments or questions,
>> 
>>        -- Boaz
> 

Mime
View raw message