calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Transitioning rules from ProjectFactory etc. to RelBuilder
Date Fri, 25 Sep 2015 20:07:19 GMT
The problem of making rules extensible is not straightforward. The solution has to be a mixture
of API changes and best practices. In this change I am making API changes and fixing up the
existing rules to match, but I am not revisiting the rules to introduce best practices.

The API change is that every RelOptRule has a relBuilderFactory. We strongly recommend that
if a rule needs to create a relational expression it either matches the type of its inputs
(calling the copy method) or it uses the factory. That means removing calls in rules to methods
such as LogicalProject.create or RelOptUtil.project. I’ve done quite a lot of this as part
of this change.

The best practice is that a rule’s constructor should expose RelBuilderFactory and Class
parameters so that the rule can be re-used. Quite a few rules do this, but a lot do not. Quite
a few rules are re-usable in principle, but have not been tested against other RelNode sub-classes,
and quite a few rules are just not re-usable. So I propose that we enforce best practices
when reviewing contributions but don’t spend a lot of effort upgrading the existing rules.

In some cases (e.g. WindowedAggRelSplitter in ProjectToWindowRule) the rule uses the default
factory but I’ve changed the code to use a RelBuilder. So, there’s no change in behavior,
but if someone were to add a RelBuilderFactory to the rule’s constructor, they would reap
the benefit.

Julian


 
> On Sep 24, 2015, at 6:29 AM, Jesus Camacho Rodriguez <jcamachorodriguez@hortonworks.com>
wrote:
> 
> In our case, we had to do some copy-pasting of rules in the past in Hive.
> Although the reasons have been already discussed in the thread, just to
> confirm:
> - Rules that use a default factory to create operators e.g.
> LogicalProject, instead of being able to provide our own factory. An
> example is AggregateProjectMergeRule. (Instead of copy/pasting the rule,
> we could apply another rule to transform LogicalProject into HiveProject,
> but this is not the right way either).
> - Rules that match directly on Logical instances of the operators. An
> example is SortProjectTransposeRule.
> 
> I see that with the solutions that we are discussing, those two problems
> would be solved.
> 
> About including the matching classes in RelBuilder, it can be a good idea
> if we still make the rules extensible i.e. we can provide our own matching
> classes if we want to. Assume a project that might subclass the Project
> operator into two different operators, then would like to match a certain
> rule only to one of them. Further, I guess using the copy methods provided
> by the operators when possible, instead of calling directly to the
> Factory, would make the rules more extensible for these cases.
> 
> --
> Jesús
> 
> 
> On 9/23/15, 11:08 PM, "Jinfeng Ni" <jinfengni99@gmail.com> wrote:
> 
>> I did a quick check of Drill side rule, and did not find the copy/paste
>> code
>> simply because we have to specify the rule's operands.
>> 
>> The reason I'm asking this is because many Calcite rules use core
>> Filter.class,
>> Project.class etc.  Currently, Drill simply uses some of Calcite rules
>> directly.
>> However, that means the rule will fire against both LogicalFilter, and
>> DrillFilter,
>> which seems to be redundant.  I understand we could extend Calcite rule to
>> make sure it matches only one of them. But I'm looking for a bit more
>> direct way
>> to do that.
>> 
>> For instance, in [2] we uses Calcite's FilterMergeRule directly, which
>> would match
>> both LogicalFilter and DrillFilter.
>> 
>> Also, I feel the rule patten specification can be regarded as the
>> input type for the
>> rule, while RelBuilder is to address the output type from the rule's
>> firing. That's why
>> I feel they had better be specified together in some way.
>> 
>> 
>> [1] 
>> https://github.com/apache/incubator-calcite/blob/master/core/src/main/java
>> /org/apache/calcite/rel/rules/FilterMergeRule.java#L46
>> 
>> [2] 
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/o
>> rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121
>> 
>> On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <jacques@apache.org>
>> wrote:
>>> I think this would be a good place to have a little bit more design,
>>> discussion and validation. Julian, it seems like you've done a lot of
>>> thinking and it would be good to have a better understanding of the
>>> choices
>>> you're making. (Your response below seems to be just scratching the
>>> surface).
>>> 
>>> If the goal is to minimize copy and paste rules, it would be useful to
>>> test
>>> the proposal against example rules that have been copy/pasted and see
>>> what
>>> other patterns might emerge. At first glance, I'm inclined more with
>>> Jinfeng's proposal regarding operands than yours. Your decision about
>>> copy/not behavior should also be discussed further.
>>> 
>>> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set
>>> of
>>> rules that we've needed to copy/paste due to constraints around rule
>>> configuration to make sure the outcome of these changes is as desired?
>>> 
>>> 
>>> 
>>> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <jhyde@apache.org> wrote:
>>> 
>>>> Great question.
>>>> 
>>>> For that case, I think you should carry on making a separate rule
>>>> instance with operands that contain classes. RelBuilder is a good
>>>> single point to put a lot of the configuration, but it is taking it
>>>> too far to put the target classes in there. I don't want its role as
>>>> "point for configuration for rules" to overwhelm its main role "an
>>>> easy way to build relational algebra expressions".
>>>> 
>>>> And besides, operands are powerful and descriptive, and because the
>>>> engine understands them it can short-cut which rules are fired.
>>>> 
>>>> For what it's worth, there's another piece of configuration that I
>>>> considered putting into RelBuilder but decided not to. That is,
>>>> whether a rule instance should call RelNode.copy to automatically
>>>> create RelNodes the same type as its inputs. See I introduced
>>>> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>>>> 
>>>> 
>>>> 
>>>> https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4
>>>> cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/Fi
>>>> lterProjectTransposeRule.java#L67
>>>> 
>>>> rather than introduce new methods RelBuilder.shouldCopyProject() and
>>>> .shouldCopyFilter().
>>>> 
>>>> And by the way I also considered adding methods to RelBuilderFactory
>>>> (formerly known as ProtoRelBuilder). Adding methods there would not
>>>> "pollute" RelBuilder as badly.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <jinfengni99@gmail.com>
>>>> wrote:
>>>>> +1.
>>>>> 
>>>>> I have one question.  RelBuilder seems to be used in onMatch()
>>>> method,
>>>>> when a particular type of RelNode has to be created.  What about
>>>>> RelOptRuleOperand in the constructor of rule? Let's say if we want to
>>>>> rule to match only DrillProject, or HiveFilter. The current way
>>>> seems to
>>>>> be that we extend the rule by providing a different
>>>> RelOptRuleOperand.
>>>>> 
>>>>> Can RelBuilder also have getFilterClass(), getProjectClass() etc,
>>>> which
>>>>> would be used to specify the rule patten matching?
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <jacques@apache.org>
>>>> wrote:
>>>>>> Agree on the utility of this. +1.
>>>>>> 
>>>>>> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>>>>>> jpullokkaran@hortonworks.com> wrote:
>>>>>> 
>>>>>>> This is useful, though Hive will have to do some refactoring.
>>>>>>> 
>>>>>>> In past Hive had to copy bunch of rules to work around these
>>>> issues.
>>>>>>> Rules & relnodes for org.apache.calcite.rel.logical.* is
a good
>>>> example of
>>>>>>> this.
>>>>>>> 
>>>>>>> 
>>>>>>> John
>>>>>>> 
>>>>>>> On 9/22/15, 5:14 PM, "Julian Hyde" <jhyde@apache.org> wrote:
>>>>>>> 
>>>>>>>> I have started work on
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-828 and my
>>>>>>>> work-in-progress is in
>>>>>>>> 
>>>> 
>>>> https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder
>>>> .
>>>>>>>> 
>>>>>>>> This will be a big change to teams such as Hive and Drill.
Now
>>>> would
>>>>>>>> be a good time to chime in.
>>>>>>>> 
>>>>>>>> The plan is that every rule instance has a ProtoRelBuilder
field
>>>> from
>>>>>>>> which a RelBuilder can be created to be used in an onMatch
>>>> method. The
>>>>>>>> RelBuilder contains factories for every kind of RelNode,
so we
>>>> won't
>>>>>>>> have to keep plumbing new factories in to existing rules.
>>>>>>>> 
>>>>>>>> There will be other advantages. RelBuilder is an easier and
more
>>>>>>>> concise way of creating relational expressions. It does some
>>>> useful
>>>>>>>> stuff for free, like flattening ANDs and eliminating identity
>>>>>>>> projections. I'm seeing the volume of code decrease and a
few
>>>> minor
>>>>>>>> plan improvements.
>>>>>>>> 
>>>>>>>> I haven't yet found a good way to deal with the pattern where
if,
>>>> say,
>>>>>>>> ProjectFactory is null the rule is to use Project.copy to
create a
>>>>>>>> Project of the same type.
>>>>>>>> 
>>>>>>>> I'm keeping & deprecating the old rule constructors that
take a
>>>>>>>> variety of XxxFactory arguments.
>>>>>>>> 
>>>>>>>> As always, we try to keep API compatibility and mark the
old API
>>>>>>>> 
>>>>>>>> @Deprecated // to be removed before 2.0
>>>>>>>> 
>>>>>>>> but the deprecated APIs are no longer tested and are likely
to be
>>>>>>>> flaky. We strongly suggest that you fix calls to deprecated
APIs
>>>> as
>>>>>>>> soon as you upgrade to a more recent version of Calcite.
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>> 
>> 
> 


Mime
View raw message