calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Adding Exchange operator and Distribution trait
Date Tue, 10 Feb 2015 23:35:39 GMT
I’ve had some discussions about adding an Exchange operator and
Distribution trait to Hive’s cost-based optimizer, which uses Calcite.
Ashutosh has logged a bug [
https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request
containing a proof-of-concept [
https://github.com/apache/incubator-calcite/pull/52/files ].

I know that Drill has a Distribution trait and several sub-classes of
Exchange operator (DrillDistributionTrait, ExchangePrel,
BroadcastExchangePrel, HashToMergeExchangePrel, HashToRandomExchangePrel,
OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in
https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical
)

I propose to create a Distribution trait and Exchange operator base class
in Calcite, with the goal that both Drill and Hive would use them. (I am
adopting Drill terminology -- Distribution rather than Partition, Exchange
rather than Shuffle — but I am pretty sure that the concepts are the same.)

public abstract class Exchange extends SingleRel {
  public final RelDistribution distribution;

  protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
input, RelDistribution distribution) {
    super(cluster, traitSet, input);
    this.distribution = distribution;
  }
}

public interface RelDistribution extends RelMultipleTrait {
  enum DistributionType {
    SINGLETON,
    HASH_DISTRIBUTED,
    RANGE_DISTRIBUTED,
    RANDOM_DISTRIBUTED,
    ROUND_ROBIN_DISTRIBUTED,
    BROADCAST_DISTRIBUTED
  }

  public DistributionType getType();
  public ImmutableIntList getFields();
}

Calcite would not contain any particular exchange algorithms. However,
since it is common to combine sort and exchange, I would create a base
class for it:

public abstract class SortExchange extends Exchange {
  public final Collation collation;

  …
}

The physical operators would remain in Drill/Hive and would likely be fully
specified by the distribution and collation; they would not need any
additional attributes. We would not be able to port
DrillDistributionTraitDef.convert directly — it would create a
LogicalExchange (analogous to how RelCollationTraitDef.convert creates a
LogicalSort) and then Drill rules would need to kick in to convert that to
HashToRandomExchangePrel etc.

I do not think that RelDistribution needs to be a “multiple” trait (compare
with RelCollation extends RelMultipleTrait, which allows a RelNode to have
more than one sort-order) but I may be wrong.

The advantages of making Exchange a first-class operator and Distribution a
trait are clear. We will be able to build a library of rules (e.g.
FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution metadata
interface, and start working on stats and cost model.

Drill and Hive stakeholders, please let me know what you think of this plan.

Julian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message