calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <asi...@maprtech.com>
Subject Re: Pushing a join condition below a LogicalCorrelate
Date Tue, 12 May 2015 03:43:54 GMT
Apart from the JoinType,  Correlate would also need to have the 'condition'
to represent a join condition because the FilterJoinRule relies on placing
the join condition on the join node during filter push down.

Summarizing the alternatives:
1.  Have a completely separate implementation of Correlate specific rules.
This has the obvious disadvantage of redundant code.  Also, it is unlikely
that
     methods such as classifyFilters() would work seamlessly with the
Correlate specific rules.
2.  The redundant code in #1 can be mitigated by creating base classes for
some of the rules and have the Join specific and Correlator specific rules
share
      the code.
3. Modify Correlate to have JoinType,  SemiJoinType as well as 'condition'.
  In this sense, it is getting closer to a Join without actually being a
derived class
    of Join. The FilterJoinRule and similar rules would  be modified  to
 use 'BiRel'  instead of 'Join'  since BiRel is the base class for both
Join and Correlate.

To Julian's question about the list of rules affected,  it seems most of
the *Join*Rules would probably need examination otherwise we could miss
certain optimizations.   However,  we would get most bang for the buck by
focusing on FilterJoinRule, so I would like to get that taken care of
first.

Aman


On Mon, May 11, 2015 at 7:06 PM, Julian Hyde <julianhyde@gmail.com> wrote:

> Seems a bit of a stretch, since Join has other ways to represent SEMI and
> ANTI. Maybe a Correlate could have both a JoinType and a SemiJoinType?
>
> Can you & Vladimir find a compromise for how to restore the missing
> functionality with no more copy-paste than necessary. It would help if we
> had a full list of rules which ought to work for Correlate.
>
> Julian
>
> On May 11, 2015, at 5:27 PM, Jinfeng Ni <jni@apache.org> wrote:
>
> > Can we extend Join.JoinType, so that it includes the SemiJointype (SEMI,
> > ANTI) represented by Correlate? That way, we could leverage the rule for
> > Join and apply them to Correlate as well, just like the way it used to
> > work. Otherwise, we have to come up with a new set of rules for
> Correlate,
> > to make thing work again.
> >
> >
> >
> > On Mon, May 11, 2015 at 5:02 PM, Julian Hyde <julian@hydromatic.net>
> wrote:
> >
> >> This comment in Correlate seems to express Vladimir’s motivation:
> >>
> >>> Correlate is not a join since: typical rules should not match
> Correlate.
> >>
> >> I agree with him. For instance, Correlate.joinType is enum SemiJoinType
> {
> >> INNER, LEFT, SEMI, ANTI } and therefore different semantics to
> >> Join.joinType.
> >>
> >> It’s unfortunate that FilterJoinRule got broken. We should fix it. Any
> >> other rules that would be needed? Probably ProjectJoinTransposeRule,
> >> AggregateJoinTransposeRule.
> >>
> >> Julian
> >>
> >>
> >> On May 11, 2015, at 4:17 PM, Aman Sinha <asinha@maprtech.com> wrote:
> >>
> >>> As part of CALCITE-483,  the class hierarchy of CorrelateRel was
> changed
> >>> such that the new LogicalCorrelate is not a derived class of Join
> >> anymore.
> >>> Thus, any rule such as FilterJoinRule that used to push the filter down
> >>> into the Join (or a derived class of Join) does not apply anymore for
> the
> >>> LogicalCorrelate.
> >>>
> >>> I am continuing down the path of my proposal to  have a version of the
> >> push
> >>> filter rule that allows pushing into/past a LogicalCorrelate.  But
> >> perhaps
> >>> Vladimir can shed some light on the motivation for changing the class
> >>> hierarchy.
> >>>
> >>> thanks,
> >>> Aman
> >>>
> >>>
> >>> On Mon, May 11, 2015 at 10:21 AM, Aman Sinha <asinha@maprtech.com>
> >> wrote:
> >>>
> >>>> Note that I have made some changes to the decorrlation logic to call
> >>>> findBestExp()  *after*  the decorrelation is done and supply it the
> set
> >> of
> >>>> rules including FilterJoinRule.  This does push the join condition
> into
> >> one
> >>>> part of the tree but it does not push it into all other parts where
> that
> >>>> join may have been copied during decorrelation.    The main point is:
> >> we
> >>>> need to do the filter pushdown early rather than late.
> >>>>
> >>>> Aman
> >>>>
> >>>> On Mon, May 11, 2015 at 10:16 AM, Aman Sinha <asinha@maprtech.com>
> >> wrote:
> >>>>
> >>>>> I want to be able to push the join condition (=($7, $9)) highlighted
> >> into
> >>>>> the LogicalJoin that is right below the LogicalCorrelate.  What's
the
> >> right
> >>>>> way to do it ?
> >>>>>
> >>>>> The current method of first decorrelating and then pushing the filter
> >>>>> (via the FilterJoinRule) is not quite right because once
> decorrelation
> >> is
> >>>>> done, it may be too late to push the filter into the join.  During
> >>>>> decorrelation we take that LogicalJoin (with its TRUE condition)
and
> >> push
> >>>>> it into other places - for instance we call createDistinct() to
> build a
> >>>>> distinct row set on the result of this join but since the join has
a
> >> true
> >>>>> condition, the distinct is created on a cartesian join.
> >>>>>
> >>>>> What I really need is something like a FilterJoinRule that allows
> >> pushing
> >>>>> it past a LogicalCorrelate.
> >>>>>
> >>>>> LogicalProject(EXPR$0=[1]): rowcount = 1.0, cumulative cost = 10.25,
> >> id =
> >>>>> 53
> >>>>> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3],
> >>>>> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8],
> >>>>> DEPTNO0=[$9], NAME=[$10], EXPR$0=[$11]): rowcount = 1.0, cumulative
> >> cost =
> >>>>> 9.25, id = 71
> >>>>> *   LogicalFilter(condition=[AND(=($7, $9), >($5, $11))]): rowcount
=
> >>>>> 1.0, cumulative cost = 8.25, id = 68*
> >>>>>     LogicalCorrelate(correlation=[$cor0], joinType=[LEFT],
> >>>>> requiredColumns=[{0}]): rowcount = 1.0, cumulative cost = 7.25,
id =
> 61
> >>>>>       LogicalJoin(condition=[true], joinType=[inner]): rowcount
=
> 1.0,
> >>>>> cumulative cost = 1.0, id = 42
> >>>>>         LogicalTableScan(table=[[CATALOG, SALES, EMP]]): rowcount
=
> >>>>> 1.0, cumulative cost = 0.0, id = 11
> >>>>>         LogicalTableScan(table=[[CATALOG, SALES, DEPT]]): rowcount
=
> >>>>> 1.0, cumulative cost = 0.0, id = 12
> >>>>>       LogicalAggregate(group=[{}], EXPR$0=[AVG($5)]): rowcount =
1.0,
> >>>>> cumulative cost = 2.125, id = 47
> >>>>>         LogicalFilter(condition=[=($cor0.EMPNO, $0)]): rowcount
=
> 1.0,
> >>>>> cumulative cost = 1.0, id = 45
> >>>>>           LogicalTableScan(table=[[CATALOG, SALES, EMP]]): rowcount
=
> >>>>> 1.0, cumulative cost = 0.0, id = 14
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Aman
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message