drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-690) Create 2-Phase aggregate plans for SUM, MIN, MAX
Date Mon, 12 May 2014 16:07:15 GMT

    [ https://issues.apache.org/jira/browse/DRILL-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995198#comment-13995198

Aman Sinha commented on DRILL-690:

Uploaded the patch to support multi-phase (in this case 2 phase) aggregates using HashAggr
and StreamingAggr for the functions SUM, MIN, MAX.   Also as part of this patch, I have added
planner options to enable/disable hash aggr, streaming aggr, multi-phase aggr, hash join and
merge join. 

> Create 2-Phase aggregate plans for SUM, MIN, MAX
> ------------------------------------------------
>                 Key: DRILL-690
>                 URL: https://issues.apache.org/jira/browse/DRILL-690
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>         Attachments: 0001-Generate-2-phase-plans-for-Hash-Aggr-and-Streaming-A.patch
> Currently, Drill generates 1-phase plans for aggregations with group-by where we do an
initial distribution (if necessary) followed by either a sort + streaming aggregate or a hash
aggregate.  In many cases, we should be able to do a 2-phase aggregation: 
> Phase 1: local grouped-aggregation first and collapse potentially to 
>                a small number of groups, 
> Intermediate step:  hash-distribution (on grouping keys) 
> Phase 2: final aggregation.  
> The amount of data transferred over the network can be potentially much smaller compared
to the 1-phase approach.  
> For aggregates such as SUM, MIN and MAX, both phase 1 and 2 do exactly the same aggregate
function; however for other aggregate functions such as COUNT, the first phase has to do a
count and second phase must SUM the counts.  In this particular enhancement, we will only
address the functions SUM, MIN, MAX. 

This message was sent by Atlassian JIRA

View raw message