drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-690) Create 2-Phase aggregate plans for SUM, MIN, MAX
Date Mon, 12 May 2014 16:05:15 GMT

     [ https://issues.apache.org/jira/browse/DRILL-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aman Sinha updated DRILL-690:
-----------------------------

    Attachment: 0001-Generate-2-phase-plans-for-Hash-Aggr-and-Streaming-A.patch

> Create 2-Phase aggregate plans for SUM, MIN, MAX
> ------------------------------------------------
>
>                 Key: DRILL-690
>                 URL: https://issues.apache.org/jira/browse/DRILL-690
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>         Attachments: 0001-Generate-2-phase-plans-for-Hash-Aggr-and-Streaming-A.patch
>
>
> Currently, Drill generates 1-phase plans for aggregations with group-by where we do an
initial distribution (if necessary) followed by either a sort + streaming aggregate or a hash
aggregate.  In many cases, we should be able to do a 2-phase aggregation: 
> Phase 1: local grouped-aggregation first and collapse potentially to 
>                a small number of groups, 
> Intermediate step:  hash-distribution (on grouping keys) 
> Phase 2: final aggregation.  
> The amount of data transferred over the network can be potentially much smaller compared
to the 1-phase approach.  
> For aggregates such as SUM, MIN and MAX, both phase 1 and 2 do exactly the same aggregate
function; however for other aggregate functions such as COUNT, the first phase has to do a
count and second phase must SUM the counts.  In this particular enhancement, we will only
address the functions SUM, MIN, MAX. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message