calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Adding Exchange operator and Distribution trait
Date Wed, 11 Feb 2015 20:45:02 GMT
Drill guys: What do you think of the proposal?

On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <hashutosh@apache.org> wrote:

Overall proposal sounds good to me. +1

On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <jhyde@apache.org> wrote:

I've had some discussions about adding an Exchange operator and
Distribution trait to Hive's cost-based optimizer, which uses Calcite.
Ashutosh has logged a bug [
https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request
containing a proof-of-concept [
https://github.com/apache/incubator-calcite/pull/52/files ].

I know that Drill has a Distribution trait and several sub-classes of
Exchange operator (DrillDistributionTrait, ExchangePrel,
BroadcastExchangePrel, HashToMergeExchangePrel, HashToRandomExchangePrel,
OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in

https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical
)

I propose to create a Distribution trait and Exchange operator base class
in Calcite, with the goal that both Drill and Hive would use them. (I am
adopting Drill terminology -- Distribution rather than Partition, Exchange
rather than Shuffle -- but I am pretty sure that the concepts are the same.)

public abstract class Exchange extends SingleRel {
 public final RelDistribution distribution;

 protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
input, RelDistribution distribution) {
   super(cluster, traitSet, input);
   this.distribution = distribution;
 }
}

public interface RelDistribution extends RelMultipleTrait {
 enum DistributionType {
   SINGLETON,
   HASH_DISTRIBUTED,
   RANGE_DISTRIBUTED,
   RANDOM_DISTRIBUTED,
   ROUND_ROBIN_DISTRIBUTED,
   BROADCAST_DISTRIBUTED
 }

 public DistributionType getType();
 public ImmutableIntList getFields();
}

Calcite would not contain any particular exchange algorithms. However,
since it is common to combine sort and exchange, I would create a base
class for it:

public abstract class SortExchange extends Exchange {
 public final Collation collation;

 ...
}

The physical operators would remain in Drill/Hive and would likely be fully
specified by the distribution and collation; they would not need any
additional attributes. We would not be able to port
DrillDistributionTraitDef.convert directly -- it would create a
LogicalExchange (analogous to how RelCollationTraitDef.convert creates a
LogicalSort) and then Drill rules would need to kick in to convert that to
HashToRandomExchangePrel etc.

I do not think that RelDistribution needs to be a "multiple" trait (compare
with RelCollation extends RelMultipleTrait, which allows a RelNode to have
more than one sort-order) but I may be wrong.

The advantages of making Exchange a first-class operator and Distribution a
trait are clear. We will be able to build a library of rules (e.g.
FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution metadata
interface, and start working on stats and cost model.

Drill and Hive stakeholders, please let me know what you think of this
plan.

Julian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message