drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-690) Create 2-Phase aggregate plans for SUM, MIN, MAX
Date Mon, 12 May 2014 01:11:16 GMT
Aman Sinha created DRILL-690:

             Summary: Create 2-Phase aggregate plans for SUM, MIN, MAX
                 Key: DRILL-690
                 URL: https://issues.apache.org/jira/browse/DRILL-690
             Project: Apache Drill
          Issue Type: Improvement
            Reporter: Aman Sinha

Currently, Drill generates 1-phase plans for aggregations with group-by where we do an initial
distribution (if necessary) followed by either a sort + streaming aggregate or a hash aggregate.
 In many cases, we should be able to do a 2-phase aggregation: 
Phase 1: local grouped-aggregation first and collapse potentially to 
               a small number of groups, 
Intermediate step:  hash-distribution (on grouping keys) 
Phase 2: final aggregation.  

The amount of data transferred over the network can be potentially much smaller compared to
the 1-phase approach.  

For aggregates such as SUM, MIN and MAX, both phase 1 and 2 do exactly the same aggregate
function; however for other aggregate functions such as COUNT, the first phase has to do a
count and second phase must SUM the counts.  In this particular enhancement, we will only
address the functions SUM, MIN, MAX. 

This message was sent by Atlassian JIRA

View raw message