drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-690) Create 2-Phase aggregate plans for SUM, MIN, MAX
Date Mon, 12 May 2014 16:05:15 GMT

     [ https://issues.apache.org/jira/browse/DRILL-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aman Sinha updated DRILL-690:

    Attachment: 0001-Generate-2-phase-plans-for-Hash-Aggr-and-Streaming-A.patch

> Create 2-Phase aggregate plans for SUM, MIN, MAX
> ------------------------------------------------
>                 Key: DRILL-690
>                 URL: https://issues.apache.org/jira/browse/DRILL-690
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>         Attachments: 0001-Generate-2-phase-plans-for-Hash-Aggr-and-Streaming-A.patch
> Currently, Drill generates 1-phase plans for aggregations with group-by where we do an
initial distribution (if necessary) followed by either a sort + streaming aggregate or a hash
aggregate.  In many cases, we should be able to do a 2-phase aggregation: 
> Phase 1: local grouped-aggregation first and collapse potentially to 
>                a small number of groups, 
> Intermediate step:  hash-distribution (on grouping keys) 
> Phase 2: final aggregation.  
> The amount of data transferred over the network can be potentially much smaller compared
to the 1-phase approach.  
> For aggregates such as SUM, MIN and MAX, both phase 1 and 2 do exactly the same aggregate
function; however for other aggregate functions such as COUNT, the first phase has to do a
count and second phase must SUM the counts.  In this particular enhancement, we will only
address the functions SUM, MIN, MAX. 

This message was sent by Atlassian JIRA

View raw message