calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <>
Subject Re: Adding Exchange operator and Distribution trait
Date Wed, 11 Feb 2015 20:45:02 GMT
Drill guys: What do you think of the proposal?

On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <> wrote:

Overall proposal sounds good to me. +1

On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <> wrote:

I've had some discussions about adding an Exchange operator and
Distribution trait to Hive's cost-based optimizer, which uses Calcite.
Ashutosh has logged a bug [ ] and pull request
containing a proof-of-concept [ ].

I know that Drill has a Distribution trait and several sub-classes of
Exchange operator (DrillDistributionTrait, ExchangePrel,
BroadcastExchangePrel, HashToMergeExchangePrel, HashToRandomExchangePrel,
OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in

I propose to create a Distribution trait and Exchange operator base class
in Calcite, with the goal that both Drill and Hive would use them. (I am
adopting Drill terminology -- Distribution rather than Partition, Exchange
rather than Shuffle -- but I am pretty sure that the concepts are the same.)

public abstract class Exchange extends SingleRel {
 public final RelDistribution distribution;

 protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
input, RelDistribution distribution) {
   super(cluster, traitSet, input);
   this.distribution = distribution;

public interface RelDistribution extends RelMultipleTrait {
 enum DistributionType {

 public DistributionType getType();
 public ImmutableIntList getFields();

Calcite would not contain any particular exchange algorithms. However,
since it is common to combine sort and exchange, I would create a base
class for it:

public abstract class SortExchange extends Exchange {
 public final Collation collation;


The physical operators would remain in Drill/Hive and would likely be fully
specified by the distribution and collation; they would not need any
additional attributes. We would not be able to port
DrillDistributionTraitDef.convert directly -- it would create a
LogicalExchange (analogous to how RelCollationTraitDef.convert creates a
LogicalSort) and then Drill rules would need to kick in to convert that to
HashToRandomExchangePrel etc.

I do not think that RelDistribution needs to be a "multiple" trait (compare
with RelCollation extends RelMultipleTrait, which allows a RelNode to have
more than one sort-order) but I may be wrong.

The advantages of making Exchange a first-class operator and Distribution a
trait are clear. We will be able to build a library of rules (e.g.
FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution metadata
interface, and start working on stats and cost model.

Drill and Hive stakeholders, please let me know what you think of this


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message