drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject [DISCUSSION] How can we improve the performance of Window Functions
Date Thu, 11 Jun 2015 22:55:04 GMT
Hi all,

The purpose of this email is to describe how window functions are computed
and to try to come up with "better" ways to do it.

DRILL-3200 <https://issues.apache.org/jira/browse/DRILL-3200> added support
some significant  improvements to the way Drill computes window functions.

The general idea was to update the code to only support the default frame
which makes it run faster and use less memory.

WindowFrameRecordBatch works similarly to StreamingAggregate: it requires
the data to be sorted on the partition and order by columns and only
computes one frame at a time. With the default frame we only need to
aggregate every row only once.
Memory consumption depend on the data, but in general each record batch is
kept in memory until we are ready to process all it's rows (which is
possible when we find the last peer row of the batch's last row). Drill's
external sort can spill to disk if data is too big, and we only need to
keep at most one partition's worth of data in memory for the window
functions to be computed (when over clause doesn't contain an order by)

Each time a batch is ready to be processed we do the following:

1- we start with it's first row (current row)
2- we compute the length of the current row's frame (in this case we find
the number of peer rows for the current row),
3- we aggregate (this includes computing the window function values) all
rows of the current frame
4- we write the aggregated value in each row of the current frame.
5- We then move to the 1st non peer row which becomes the current row
6- if we didn't reach the end of the current batch go back to 2

With all this in mind, how can we improve the performance of window
functions ?


Abdelhakim Deneche

Software Engineer


Now Available - Free Hadoop On-Demand Training

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message