drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: Partial aggregation in Drill-on-Phoenix
Date Mon, 05 Oct 2015 16:52:07 GMT
Right now this type of work is done here:


With Distribution Trait application here:

To me, the easiest way to solve the Phoenix issue is by providing a rule
that matches HashAgg and StreamAgg but requires Phoenix convention as
input. It would replace everywhere but would only be plannable when it is
the first phase of aggregation.


Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Oct 1, 2015 at 2:30 PM, Julian Hyde <jhyde@apache.org> wrote:

> Phoenix is able to perform quite a few relational operations on the
> region server: scan, filter, project, aggregate, sort (optionally with
> limit). However, the sort and aggregate are necessarily "local". They
> can only deal with data on that region server, and there needs to be a
> further operation to combine the results from the region servers.
> The question is how to plan such queries. I think the answer is an
> AggregateExchangeTransposeRule.
> The rule would spot an Aggregate on a data source that is split into
> multiple locations (partitions) and split it into a partial Aggregate
> that computes sub-totals and a summarizing Aggregate that combines
> those totals.
> How does the planner know that the Aggregate needs to be split? Since
> the data's distribution has changed, there would need to be an
> Exchange operator. It is the Exchange operator that triggers the rule
> to fire.
> There are some special cases. If the data is sorted as well as
> partitioned (say because the local aggregate uses a sort-based
> algorithm) we could maybe use a more efficient plan. And if the
> partition key is the same as the aggregation key we don't need a
> summarizing Aggregate, just a Union.
> It turns out not to be very Phoenix-specific. In the Drill-on-Phoenix
> scenario, once the Aggregate has been pushed through the Exchange
> (i.e. onto the drill-bit residing on the region server) we can then
> push the DrillAggregate across the drill-to-phoenix membrane and make
> it into a PhoenixServerAggregate that executes in the region server.
> Related issues:
> * https://issues.apache.org/jira/browse/DRILL-3840
> * https://issues.apache.org/jira/browse/CALCITE-751
> Julian

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message