calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chunwei Lei <chunwei.l...@gmail.com>
Subject Re: Correlate Join SemiJoin transformation
Date Wed, 05 Jun 2019 02:44:51 GMT
Thanks for raising this, Haisheng.

Do you mean that we should have better subquery unnesting?




Best,
Chunwei


On Tue, Jun 4, 2019 at 8:27 AM Haisheng Yuan <h.yuan@alibaba-inc.com> wrote:

> Hi,
>
> Since we have commited CALCITE-2969, and in the next release of 1.21, we
> will deprecate EquiJoin and make enumerable hash join support non-equi join
> condition, the optimization door is open to us.
>
> Currently Calcite often generates complicated and inefficient plan for
> boolean context correlated subquery (putting value context subquery aside,
> which is more complex during unnesting). e.g.
> https://issues.apache.org/jira/browse/CALCITE-2857
> https://issues.apache.org/jira/browse/CALCITE-2874
>
> There are many more cases if you want to try and find out. Some underlying
> reasons are:
> 1. SemJoin doesn't support non-equi condition
> 2. Correlate comes with Aggregate in many cases
>
> Sometimes, SubqueryRemoveRule generates Correlate with Aggregate,
> sometimes generates Join with Aggregate. Then we use SemiJoinRule to match
> Join(Rel, Agg) pattern to see whether we can transform it to semijoin or
> not. This is counter intuitive, because we already know it will be a
> SemiJoin or AntiSemi when we generate Correlate for the subquery. In
> current way, we may miss the chance to get back SemiJoin for it, and we
> almost lose the opportunity to do IndexScan if there is index available.
>
> In addition, I see 2 rules JoinAddRedundantSemiJoinRule and
> SemiJoinRemoveRule seem to provide the chance to do index scan. But through
> the test cases related with the 2 rules, I don't see any chance for a
> better plan with them and they don't seem to be able to generate plan with
> index scan. I doubt whether they are used at all. Can someone shed some
> light on this and give us some examples, if I get it wrong?
>
> The transformation flow for boolean context sub-query in my head is:
> sub-query -> Correate(Semi/AntiSemi) without Aggregate (SubqueryRemoveRule)
> Correlate(Semi/AntiSemi) -> NestedLoopJoin (correlate implementation rule)
> Correlate(Semi/AntiSemi) -> IndexedNestedLoopJoin (index apply rule)
> Correlate(Semi/AntiSemi) -> SemiJoin/AntiSemiJoin (Semi Correlate to Semi
> Join rule)
> SemiJoin -> InnerJoin(rel, Agg) (SemiJoin to Join rule)
> InnerJoin(rel, Agg) ->  InnerJoin(Agg, rel) (Join reorder rule)
> SemiJoin/InnerJoin -> HashJoin/MergeJoin/NLJ (Join implementation rule)
>
> This is a long shot, and will involve tons of changes.
> Any thoughts?
>
> - Haisheng Yuan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message