calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Adding Exchange operator and Distribution trait
Date Wed, 11 Feb 2015 19:34:49 GMT
Overall proposal sounds good to me. +1

On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <jhyde@apache.org> wrote:

> I've had some discussions about adding an Exchange operator and
> Distribution trait to Hive's cost-based optimizer, which uses Calcite.
> Ashutosh has logged a bug [
> https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request
> containing a proof-of-concept [
> https://github.com/apache/incubator-calcite/pull/52/files ].
>
> I know that Drill has a Distribution trait and several sub-classes of
> Exchange operator (DrillDistributionTrait, ExchangePrel,
> BroadcastExchangePrel, HashToMergeExchangePrel, HashToRandomExchangePrel,
> OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in
>
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical
> )
>
> I propose to create a Distribution trait and Exchange operator base class
> in Calcite, with the goal that both Drill and Hive would use them. (I am
> adopting Drill terminology -- Distribution rather than Partition, Exchange
> rather than Shuffle -- but I am pretty sure that the concepts are the same.)
>
> public abstract class Exchange extends SingleRel {
>   public final RelDistribution distribution;
>
>   protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
> input, RelDistribution distribution) {
>     super(cluster, traitSet, input);
>     this.distribution = distribution;
>   }
> }
>
> public interface RelDistribution extends RelMultipleTrait {
>   enum DistributionType {
>     SINGLETON,
>     HASH_DISTRIBUTED,
>     RANGE_DISTRIBUTED,
>     RANDOM_DISTRIBUTED,
>     ROUND_ROBIN_DISTRIBUTED,
>     BROADCAST_DISTRIBUTED
>   }
>
>   public DistributionType getType();
>   public ImmutableIntList getFields();
> }
>
> Calcite would not contain any particular exchange algorithms. However,
> since it is common to combine sort and exchange, I would create a base
> class for it:
>
> public abstract class SortExchange extends Exchange {
>   public final Collation collation;
>
>   ...
> }
>
> The physical operators would remain in Drill/Hive and would likely be fully
> specified by the distribution and collation; they would not need any
> additional attributes. We would not be able to port
> DrillDistributionTraitDef.convert directly -- it would create a
> LogicalExchange (analogous to how RelCollationTraitDef.convert creates a
> LogicalSort) and then Drill rules would need to kick in to convert that to
> HashToRandomExchangePrel etc.
>
> I do not think that RelDistribution needs to be a "multiple" trait (compare
> with RelCollation extends RelMultipleTrait, which allows a RelNode to have
> more than one sort-order) but I may be wrong.
>
> The advantages of making Exchange a first-class operator and Distribution a
> trait are clear. We will be able to build a library of rules (e.g.
> FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution metadata
> interface, and start working on stats and cost model.
>
> Drill and Hive stakeholders, please let me know what you think of this
> plan.
>
> Julian
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message