calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jess Balint <jbal...@gmail.com>
Subject Re: how does SortRemoveRule work?
Date Sat, 15 Oct 2016 02:28:22 GMT
Is my test incorrect? This also fails:

  runDuplicateSortCheck("select empid "
                        + "from emps "
                        + "order by name ",
                        "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
                        + "  EnumerableProject(empid=[$0], name=[$2])\n"
                        + "    EnumerableTableScan(table=[[hr, emps]])\n");

The resulting plan is:

EnumerableProject(empid=[$0], name=[$2])
  EnumerableTableScan(table=[[hr, emps]])

No sort... The scan is not sorted. I get the same plan when changing the
sort to any other column.

Jess

On Fri, Oct 14, 2016 at 7:19 PM, Julian Hyde <jhyde@apache.org> wrote:

> If the sort is trivial (i.e. its input is already sorted) then the outcome
> of calling call.transformTo(…) will be to put the sort and its input into
> the same subset. Thus the input could be used as a (cheaper) alternative.
>
> If the sort is not trivial, the effect of the rule will be put the sort
> and its input in different RelSubsets of the same RelSet. They were
> probably in different RelSets before the rule was called, in which case it
> will trigger a merge of those sets: the planner now knows that any
> relational expression in either of those sets is now equivalent to any
> other.
>
> > On Oct 14, 2016, at 3:34 PM, Jess Balint <jbalint@gmail.com> wrote:
> >
> > On Fri, Oct 14, 2016 at 5:16 PM, Julian Hyde <jhyde@apache.org> wrote:
> >
> >> Note the important expression:
> >>
> >>  convert(sort.getInput(), traits)
> >>
> >> This creates a new relational expression (or returns an existing one)
> that
> >> is “input" sorted by the required fields. It doesn’t change “input”.
> >>
> >>
> > I dug into getOrCreateSubset/addAbstractConverters but I still don't see
> > how this could work.
> >
> >
> >> By the way, the Volcano paper talks about physical properties (what we
> >> call traits) and enforcers. An enforcer is a relational operator that
> >> doesn’t change the logical content of the data, just changes its
> physical
> >> properties.
> >>
> >>
> > Isn't the sort node acting as the enforcer here? I don't see a reason to
> > remove it and then add it back via convert(). I'm specifically looking at
> > the case where the sort is necessary (i.e. should not be removed by this
> > optimization).
> >
> >
> >> Every physical property has a corresponding enforcer: Sort is the
> enforcer
> >> of the Collation trait; Shuffle is the enforcer of the Distribution
> trait;
> >> Convert is the enforcer of the CallingConvention trait.
> >>
> >> The short piece of code you quoted may seem almost trivial, but it is
> >> stating something profound: the relationship between Sort and Collation.
> >>
> >> The trick, of course, is to find a way of achieving a sort order without
> >> using a sort (e.g. finding a copy of the table that contains the columns
> >> you want and is sorted in the order you want, i.e. an index). The whole
> >> reason we have the concept of physical properties is to make that kind
> of
> >> optimization (sometimes) possible.
> >>
> >>
> > I created a simple test:
> >
> >  @Test public void testTwoSortDontRemove() throws Exception {
> >    runDuplicateSortCheck("select empid+deptno from ( "
> >                          + "select empid, deptno "
> >                          + "from emps "
> >                          + "order by empid) "
> >                          + "order by deptno",
> >                          "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
> >                          + "  EnumerableProject(EXPR$0=[+($0, $1)],
> > deptno=[$1])\n"
> >                          + "    EnumerableSort(sort0=[$0], dir0=[ASC])\n"
> >                          + "      EnumerableProject(empid=[$0],
> > deptno=[$1])\n"
> >                          + "        EnumerableTableScan(table=[[hr,
> > emps]])\n");
> >  }
> >
> > When I run this, I get an error because the sort is removed:
> >
> > EnumerableProject(EXPR$0=[+($0, $1)], deptno=[$1])
> >  EnumerableSort(sort0=[$0], dir0=[ASC])
> >    EnumerableProject(empid=[$0], deptno=[$1])
> >      EnumerableTableScan(table=[[hr, emps]])
> >
> > I changed the original method in question to only remove the sort if the
> > requested collation matches that of it's input:
> >
> >    if
> > (sort.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE)
> .equals(collation))
> > {
> >      final RelTraitSet traits =
> > sort.getInput().getTraitSet().replace(collation);
> >      call.transformTo(convert(sort.getInput(), traits));
> >    }
> >
> > This will cause the test to pass. Isn't this a bug?
> >
> > Jess
> >
> >
> >> Julian
> >>
> >>
> >>> On Oct 14, 2016, at 2:33 PM, Jess Balint <jbalint@gmail.com> wrote:
> >>>
> >>> Just doing a bit of code reading here. Looking at the SortRemoveRule
> >>> implementation:
> >>>
> >>>   // Express the "sortedness" requirement in terms of a collation trait
> >>> and
> >>>   // we can get rid of the sort. This allows us to use rels that just
> >>> happen
> >>>   // to be sorted but get the same effect.
> >>>   final RelCollation collation = sort.getCollation();
> >>>   assert collation == sort.getTraitSet()
> >>>       .getTrait(RelCollationTraitDef.INSTANCE);
> >>>   final RelTraitSet traits =
> >>> sort.getInput().getTraitSet().replace(collation);
> >>>   call.transformTo(convert(sort.getInput(), traits));
> >>>
> >>> I don't understand how this is correct without checking that the input
> >>> collation is the same as the collation requested by the sort. It looks
> to
> >>> me like it just replaces the input's collation with that of the sort.
> Any
> >>> thoughts?
> >>>
> >>> Thanks.
> >>>
> >>> Jess
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message