calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <>
Subject Adding Exchange operator and Distribution trait
Date Tue, 10 Feb 2015 23:35:39 GMT
I’ve had some discussions about adding an Exchange operator and
Distribution trait to Hive’s cost-based optimizer, which uses Calcite.
Ashutosh has logged a bug [ ] and pull request
containing a proof-of-concept [ ].

I know that Drill has a Distribution trait and several sub-classes of
Exchange operator (DrillDistributionTrait, ExchangePrel,
BroadcastExchangePrel, HashToMergeExchangePrel, HashToRandomExchangePrel,
OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in

I propose to create a Distribution trait and Exchange operator base class
in Calcite, with the goal that both Drill and Hive would use them. (I am
adopting Drill terminology -- Distribution rather than Partition, Exchange
rather than Shuffle — but I am pretty sure that the concepts are the same.)

public abstract class Exchange extends SingleRel {
  public final RelDistribution distribution;

  protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
input, RelDistribution distribution) {
    super(cluster, traitSet, input);
    this.distribution = distribution;

public interface RelDistribution extends RelMultipleTrait {
  enum DistributionType {

  public DistributionType getType();
  public ImmutableIntList getFields();

Calcite would not contain any particular exchange algorithms. However,
since it is common to combine sort and exchange, I would create a base
class for it:

public abstract class SortExchange extends Exchange {
  public final Collation collation;


The physical operators would remain in Drill/Hive and would likely be fully
specified by the distribution and collation; they would not need any
additional attributes. We would not be able to port
DrillDistributionTraitDef.convert directly — it would create a
LogicalExchange (analogous to how RelCollationTraitDef.convert creates a
LogicalSort) and then Drill rules would need to kick in to convert that to
HashToRandomExchangePrel etc.

I do not think that RelDistribution needs to be a “multiple” trait (compare
with RelCollation extends RelMultipleTrait, which allows a RelNode to have
more than one sort-order) but I may be wrong.

The advantages of making Exchange a first-class operator and Distribution a
trait are clear. We will be able to build a library of rules (e.g.
FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution metadata
interface, and start working on stats and cost model.

Drill and Hive stakeholders, please let me know what you think of this plan.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message