calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <asi...@maprtech.com>
Subject Re: Pushing a join condition below a LogicalCorrelate
Date Tue, 12 May 2015 03:46:21 GMT
Based on the discussion so far, it seems we would want to go with option
#3.   Let me know if there are potential problems with that approach.

Aman

On Mon, May 11, 2015 at 8:43 PM, Aman Sinha <asinha@maprtech.com> wrote:

> Apart from the JoinType,  Correlate would also need to have the
> 'condition' to represent a join condition because the FilterJoinRule relies
> on placing the join condition on the join node during filter push down.
>
> Summarizing the alternatives:
> 1.  Have a completely separate implementation of Correlate specific
> rules.  This has the obvious disadvantage of redundant code.  Also, it is
> unlikely that
>      methods such as classifyFilters() would work seamlessly with the
> Correlate specific rules.
> 2.  The redundant code in #1 can be mitigated by creating base classes for
> some of the rules and have the Join specific and Correlator specific rules
> share
>       the code.
> 3. Modify Correlate to have JoinType,  SemiJoinType as well as
> 'condition'.   In this sense, it is getting closer to a Join without
> actually being a derived class
>     of Join. The FilterJoinRule and similar rules would  be modified  to
>  use 'BiRel'  instead of 'Join'  since BiRel is the base class for both
> Join and Correlate.
>
> To Julian's question about the list of rules affected,  it seems most of
> the *Join*Rules would probably need examination otherwise we could miss
> certain optimizations.   However,  we would get most bang for the buck by
> focusing on FilterJoinRule, so I would like to get that taken care of
> first.
>
> Aman
>
>
> On Mon, May 11, 2015 at 7:06 PM, Julian Hyde <julianhyde@gmail.com> wrote:
>
>> Seems a bit of a stretch, since Join has other ways to represent SEMI and
>> ANTI. Maybe a Correlate could have both a JoinType and a SemiJoinType?
>>
>> Can you & Vladimir find a compromise for how to restore the missing
>> functionality with no more copy-paste than necessary. It would help if we
>> had a full list of rules which ought to work for Correlate.
>>
>> Julian
>>
>> On May 11, 2015, at 5:27 PM, Jinfeng Ni <jni@apache.org> wrote:
>>
>> > Can we extend Join.JoinType, so that it includes the SemiJointype (SEMI,
>> > ANTI) represented by Correlate? That way, we could leverage the rule for
>> > Join and apply them to Correlate as well, just like the way it used to
>> > work. Otherwise, we have to come up with a new set of rules for
>> Correlate,
>> > to make thing work again.
>> >
>> >
>> >
>> > On Mon, May 11, 2015 at 5:02 PM, Julian Hyde <julian@hydromatic.net>
>> wrote:
>> >
>> >> This comment in Correlate seems to express Vladimir’s motivation:
>> >>
>> >>> Correlate is not a join since: typical rules should not match
>> Correlate.
>> >>
>> >> I agree with him. For instance, Correlate.joinType is enum
>> SemiJoinType {
>> >> INNER, LEFT, SEMI, ANTI } and therefore different semantics to
>> >> Join.joinType.
>> >>
>> >> It’s unfortunate that FilterJoinRule got broken. We should fix it. Any
>> >> other rules that would be needed? Probably ProjectJoinTransposeRule,
>> >> AggregateJoinTransposeRule.
>> >>
>> >> Julian
>> >>
>> >>
>> >> On May 11, 2015, at 4:17 PM, Aman Sinha <asinha@maprtech.com> wrote:
>> >>
>> >>> As part of CALCITE-483,  the class hierarchy of CorrelateRel was
>> changed
>> >>> such that the new LogicalCorrelate is not a derived class of Join
>> >> anymore.
>> >>> Thus, any rule such as FilterJoinRule that used to push the filter
>> down
>> >>> into the Join (or a derived class of Join) does not apply anymore for
>> the
>> >>> LogicalCorrelate.
>> >>>
>> >>> I am continuing down the path of my proposal to  have a version of the
>> >> push
>> >>> filter rule that allows pushing into/past a LogicalCorrelate.  But
>> >> perhaps
>> >>> Vladimir can shed some light on the motivation for changing the class
>> >>> hierarchy.
>> >>>
>> >>> thanks,
>> >>> Aman
>> >>>
>> >>>
>> >>> On Mon, May 11, 2015 at 10:21 AM, Aman Sinha <asinha@maprtech.com>
>> >> wrote:
>> >>>
>> >>>> Note that I have made some changes to the decorrlation logic to
call
>> >>>> findBestExp()  *after*  the decorrelation is done and supply it
the
>> set
>> >> of
>> >>>> rules including FilterJoinRule.  This does push the join condition
>> into
>> >> one
>> >>>> part of the tree but it does not push it into all other parts where
>> that
>> >>>> join may have been copied during decorrelation.    The main point
is:
>> >> we
>> >>>> need to do the filter pushdown early rather than late.
>> >>>>
>> >>>> Aman
>> >>>>
>> >>>> On Mon, May 11, 2015 at 10:16 AM, Aman Sinha <asinha@maprtech.com>
>> >> wrote:
>> >>>>
>> >>>>> I want to be able to push the join condition (=($7, $9)) highlighted
>> >> into
>> >>>>> the LogicalJoin that is right below the LogicalCorrelate.  What's
>> the
>> >> right
>> >>>>> way to do it ?
>> >>>>>
>> >>>>> The current method of first decorrelating and then pushing the
>> filter
>> >>>>> (via the FilterJoinRule) is not quite right because once
>> decorrelation
>> >> is
>> >>>>> done, it may be too late to push the filter into the join. 
During
>> >>>>> decorrelation we take that LogicalJoin (with its TRUE condition)
and
>> >> push
>> >>>>> it into other places - for instance we call createDistinct()
to
>> build a
>> >>>>> distinct row set on the result of this join but since the join
has a
>> >> true
>> >>>>> condition, the distinct is created on a cartesian join.
>> >>>>>
>> >>>>> What I really need is something like a FilterJoinRule that allows
>> >> pushing
>> >>>>> it past a LogicalCorrelate.
>> >>>>>
>> >>>>> LogicalProject(EXPR$0=[1]): rowcount = 1.0, cumulative cost
= 10.25,
>> >> id =
>> >>>>> 53
>> >>>>> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3],
>> >>>>> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8],
>> >>>>> DEPTNO0=[$9], NAME=[$10], EXPR$0=[$11]): rowcount = 1.0, cumulative
>> >> cost =
>> >>>>> 9.25, id = 71
>> >>>>> *   LogicalFilter(condition=[AND(=($7, $9), >($5, $11))]):
rowcount
>> =
>> >>>>> 1.0, cumulative cost = 8.25, id = 68*
>> >>>>>     LogicalCorrelate(correlation=[$cor0], joinType=[LEFT],
>> >>>>> requiredColumns=[{0}]): rowcount = 1.0, cumulative cost = 7.25,
id
>> = 61
>> >>>>>       LogicalJoin(condition=[true], joinType=[inner]): rowcount
=
>> 1.0,
>> >>>>> cumulative cost = 1.0, id = 42
>> >>>>>         LogicalTableScan(table=[[CATALOG, SALES, EMP]]): rowcount
=
>> >>>>> 1.0, cumulative cost = 0.0, id = 11
>> >>>>>         LogicalTableScan(table=[[CATALOG, SALES, DEPT]]): rowcount
=
>> >>>>> 1.0, cumulative cost = 0.0, id = 12
>> >>>>>       LogicalAggregate(group=[{}], EXPR$0=[AVG($5)]): rowcount
=
>> 1.0,
>> >>>>> cumulative cost = 2.125, id = 47
>> >>>>>         LogicalFilter(condition=[=($cor0.EMPNO, $0)]): rowcount
=
>> 1.0,
>> >>>>> cumulative cost = 1.0, id = 45
>> >>>>>           LogicalTableScan(table=[[CATALOG, SALES, EMP]]): rowcount
>> =
>> >>>>> 1.0, cumulative cost = 0.0, id = 14
>> >>>>>
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Aman
>> >>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message