calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: Adding Exchange operator and Distribution trait
Date Wed, 11 Feb 2015 23:23:02 GMT
About John's comment about put bounds in the plan search space, does
Calcite allow us to specify some bounds in the planner, and stop the
searching with the best plan found so far after that bounds are meet?

AFAIK, in TpchTest, if I turn on "calcite.test.slow", then some queries
like TPCH  Q5, Q7, Q8 seem to not come back with a plan, after several
minutes, when run on my laptop.  If Calcite has the ability to specify the
search bounds ( say # of rules are fired, or # of possible plans
enumerated), then it should return a plan within a reasonable amount of
time, in stead of keeping on searching, searching, and searching, and
possibly never end.



On Wed, Feb 11, 2015 at 2:49 PM, Julian Hyde <julianhyde@gmail.com> wrote:

> Aman,
>
> RelDistribution will be an interface, and there’s no reason why Drill
> shouldn’t have its own values or even sub-classes. As long as
> RelDistributionTraitDef is able to canonize them. So you could, for
> instance, sub-class "Hash[1, 3]” and specify which hash function is being
> used.
>
> I’ve addressed the comment about logical exchange already — you can go
> straight to physical.
>
>
> On Feb 11, 2015, at 2:34 PM, Aman Sinha <asinha@maprtech.com> wrote:
>
> > I am neutral on this for now until we give it more thought.  The reason
> > being that since Calcite is not aware of the execution engine's
> capability
> > and configuration parameters for distribution (e.g Drill has a few
> > parameters, including just true/false type of flags that determine
> whether
> > or not an Exchange node is even inserted in the plan and if it is used,
> > what type of Exchange it is etc.).  In that sense, if the logical plan
> > produced by Calcite contains a LogicalExchange, it is possible that Drill
> > may not be able use it directly while building the physical plan.
> >
> > I do however see the benefits in terms of trait propagation, combining
> > distribution and collation traits and consolidating the subsumption logic
> > in some base class such that it is useful for other consumers of Calcite.
> >
> > Aman
> >
> > On Wed, Feb 11, 2015 at 2:21 PM, Jinfeng Ni <jinfengni99@gmail.com>
> wrote:
> >
> >> Drill currently  do query planing in two phases : 1) logical planning,
> >> which handles join order, logical filter/project push down etc, and 2)
> >> physical planning, which makes decision between different physical
> >> operators ( different join / aggregation method), filter/project push
> down
> >> (storage-specific rule), and insert EXCHANGE.   Part of the reason to
> put
> >> into two phases is when the two phases are merged together, the planning
> >> time is increased significantly ( since the planner need to enumerate
> >> different join orders, multiplied by different choices of EXCHANGE).
> >>
> >> The new rules that you are proposing seems to want to build plan in one
> >> single logical planing phase.  I'm not sure how it will impact the
> overall
> >> planning time.
> >>
> >>
> >>
> >> On Wed, Feb 11, 2015 at 1:38 PM, Jinfeng Ni <jinfengni99@gmail.com>
> wrote:
> >>
> >>> I think it's a good proposal to put Exchange/Distribution into Calcite
> >>> library.
> >>>
> >>> Make sense to me.  +1
> >>>
> >>>
> >>>
> >>> On Wed, Feb 11, 2015 at 12:45 PM, Julian Hyde <jhyde@apache.org>
> wrote:
> >>>
> >>>> Drill guys: What do you think of the proposal?
> >>>>
> >>>> On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <hashutosh@apache.org>
> >>>> wrote:
> >>>>
> >>>> Overall proposal sounds good to me. +1
> >>>>
> >>>> On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <jhyde@apache.org>
> wrote:
> >>>>
> >>>> I've had some discussions about adding an Exchange operator and
> >>>> Distribution trait to Hive's cost-based optimizer, which uses Calcite.
> >>>> Ashutosh has logged a bug [
> >>>> https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request
> >>>> containing a proof-of-concept [
> >>>> https://github.com/apache/incubator-calcite/pull/52/files ].
> >>>>
> >>>> I know that Drill has a Distribution trait and several sub-classes of
> >>>> Exchange operator (DrillDistributionTrait, ExchangePrel,
> >>>> BroadcastExchangePrel, HashToMergeExchangePrel,
> >> HashToRandomExchangePrel,
> >>>> OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in
> >>>>
> >>>>
> >>>>
> >>
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical
> >>>> )
> >>>>
> >>>> I propose to create a Distribution trait and Exchange operator base
> >> class
> >>>> in Calcite, with the goal that both Drill and Hive would use them. (I
> am
> >>>> adopting Drill terminology -- Distribution rather than Partition,
> >> Exchange
> >>>> rather than Shuffle -- but I am pretty sure that the concepts are the
> >>>> same.)
> >>>>
> >>>> public abstract class Exchange extends SingleRel {
> >>>> public final RelDistribution distribution;
> >>>>
> >>>> protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode
> >>>> input, RelDistribution distribution) {
> >>>>   super(cluster, traitSet, input);
> >>>>   this.distribution = distribution;
> >>>> }
> >>>> }
> >>>>
> >>>> public interface RelDistribution extends RelMultipleTrait {
> >>>> enum DistributionType {
> >>>>   SINGLETON,
> >>>>   HASH_DISTRIBUTED,
> >>>>   RANGE_DISTRIBUTED,
> >>>>   RANDOM_DISTRIBUTED,
> >>>>   ROUND_ROBIN_DISTRIBUTED,
> >>>>   BROADCAST_DISTRIBUTED
> >>>> }
> >>>>
> >>>> public DistributionType getType();
> >>>> public ImmutableIntList getFields();
> >>>> }
> >>>>
> >>>> Calcite would not contain any particular exchange algorithms. However,
> >>>> since it is common to combine sort and exchange, I would create a base
> >>>> class for it:
> >>>>
> >>>> public abstract class SortExchange extends Exchange {
> >>>> public final Collation collation;
> >>>>
> >>>> ...
> >>>> }
> >>>>
> >>>> The physical operators would remain in Drill/Hive and would likely be
> >>>> fully
> >>>> specified by the distribution and collation; they would not need any
> >>>> additional attributes. We would not be able to port
> >>>> DrillDistributionTraitDef.convert directly -- it would create a
> >>>> LogicalExchange (analogous to how RelCollationTraitDef.convert
> creates a
> >>>> LogicalSort) and then Drill rules would need to kick in to convert
> that
> >> to
> >>>> HashToRandomExchangePrel etc.
> >>>>
> >>>> I do not think that RelDistribution needs to be a "multiple" trait
> >>>> (compare
> >>>> with RelCollation extends RelMultipleTrait, which allows a RelNode to
> >> have
> >>>> more than one sort-order) but I may be wrong.
> >>>>
> >>>> The advantages of making Exchange a first-class operator and
> >> Distribution
> >>>> a
> >>>> trait are clear. We will be able to build a library of rules (e.g.
> >>>> FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution
> >> metadata
> >>>> interface, and start working on stats and cost model.
> >>>>
> >>>> Drill and Hive stakeholders, please let me know what you think of this
> >>>> plan.
> >>>>
> >>>> Julian
> >>>>
> >>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message