calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: Adding Exchange operator and Distribution trait
Date Wed, 11 Feb 2015 22:21:33 GMT
Drill currently  do query planing in two phases : 1) logical planning,
which handles join order, logical filter/project push down etc, and 2)
physical planning, which makes decision between different physical
operators ( different join / aggregation method), filter/project push down
(storage-specific rule), and insert EXCHANGE.   Part of the reason to put
into two phases is when the two phases are merged together, the planning
time is increased significantly ( since the planner need to enumerate
different join orders, multiplied by different choices of EXCHANGE).

The new rules that you are proposing seems to want to build plan in one
single logical planing phase.  I'm not sure how it will impact the overall
planning time.



On Wed, Feb 11, 2015 at 1:38 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:

> I think it's a good proposal to put Exchange/Distribution into Calcite
> library.
>
> Make sense to me.  +1
>
>
>
> On Wed, Feb 11, 2015 at 12:45 PM, Julian Hyde <jhyde@apache.org> wrote:
>
>> Drill guys: What do you think of the proposal?
>>
>> On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <hashutosh@apache.org>
>> wrote:
>>
>> Overall proposal sounds good to me. +1
>>
>> On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <jhyde@apache.org> wrote:
>>
>> I've had some discussions about adding an Exchange operator and
>> Distribution trait to Hive's cost-based optimizer, which uses Calcite.
>> Ashutosh has logged a bug [
>> https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request
>> containing a proof-of-concept [
>> https://github.com/apache/incubator-calcite/pull/52/files ].
>>
>> I know that Drill has a Distribution trait and several sub-classes of
>> Exchange operator (DrillDistributionTrait, ExchangePrel,
>> BroadcastExchangePrel, HashToMergeExchangePrel, HashToRandomExchangePrel,
>> OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in
>>
>>
>> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical
>> )
>>
>> I propose to create a Distribution trait and Exchange operator base class
>> in Calcite, with the goal that both Drill and Hive would use them. (I am
>> adopting Drill terminology -- Distribution rather than Partition, Exchange
>> rather than Shuffle -- but I am pretty sure that the concepts are the
>> same.)
>>
>> public abstract class Exchange extends SingleRel {
>>  public final RelDistribution distribution;
>>
>>  protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
>> input, RelDistribution distribution) {
>>    super(cluster, traitSet, input);
>>    this.distribution = distribution;
>>  }
>> }
>>
>> public interface RelDistribution extends RelMultipleTrait {
>>  enum DistributionType {
>>    SINGLETON,
>>    HASH_DISTRIBUTED,
>>    RANGE_DISTRIBUTED,
>>    RANDOM_DISTRIBUTED,
>>    ROUND_ROBIN_DISTRIBUTED,
>>    BROADCAST_DISTRIBUTED
>>  }
>>
>>  public DistributionType getType();
>>  public ImmutableIntList getFields();
>> }
>>
>> Calcite would not contain any particular exchange algorithms. However,
>> since it is common to combine sort and exchange, I would create a base
>> class for it:
>>
>> public abstract class SortExchange extends Exchange {
>>  public final Collation collation;
>>
>>  ...
>> }
>>
>> The physical operators would remain in Drill/Hive and would likely be
>> fully
>> specified by the distribution and collation; they would not need any
>> additional attributes. We would not be able to port
>> DrillDistributionTraitDef.convert directly -- it would create a
>> LogicalExchange (analogous to how RelCollationTraitDef.convert creates a
>> LogicalSort) and then Drill rules would need to kick in to convert that to
>> HashToRandomExchangePrel etc.
>>
>> I do not think that RelDistribution needs to be a "multiple" trait
>> (compare
>> with RelCollation extends RelMultipleTrait, which allows a RelNode to have
>> more than one sort-order) but I may be wrong.
>>
>> The advantages of making Exchange a first-class operator and Distribution
>> a
>> trait are clear. We will be able to build a library of rules (e.g.
>> FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution metadata
>> interface, and start working on stats and cost model.
>>
>> Drill and Hive stakeholders, please let me know what you think of this
>> plan.
>>
>> Julian
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message