calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shoshin <Alexander_Shos...@epam.com>
Subject Problems with abstract syntax tree
Date Mon, 31 Oct 2016 15:46:48 GMT
Hello, everybody.

Trying to resolve an Apache Flink issue I got some troubles with Calcite. Can you help me
to understand is there a problem in Calcite or just in wrong settings passed to Calcite functions?

I have a simple table "Words" with one column named "word" and a query with NOT IN operator:
val query = "SELECT word FROM Words WHERE word NOT IN (SELECT word FROM Words WHERE word =
'hello')"

This query parsed by org.apache.calcite.sql.parser.SqlParser.parseStmt() and then transformed
to a relational tree by org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(...).

As a result I see the following abstract syntax tree
LogicalProject(word=[$0])
  LogicalFilter(condition=[NOT(CASE(=($1, 0), false, IS NOT NULL($5), true, IS NULL($3), null,
<($2, $1), null, false))])
    LogicalJoin(condition=[=($3, $4)], joinType=[left])
      LogicalProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$0])
        LogicalJoin(condition=[true], joinType=[inner])
          EnumerableTableScan(table=[[Words]])
          LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
            LogicalProject($f0=[$0], $f1=[true])
              LogicalProject(word=[$0])
                LogicalFilter(condition=[=($0, 'hello')])
                  EnumerableTableScan(table=[[Words]])
      LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
        LogicalProject($f0=[$0], $f1=[true])
          LogicalProject(word=[$0])
            LogicalFilter(condition=[=($0, 'hello')])
              EnumerableTableScan(table=[[Words]])

which fails later during query plan optimization (while calling org.apache.calcite.tools.Programs.RuleSetProgram.run()).

I think it might be because of a very complex abstract syntax tree generated by Calcite. Shouldn't
it be more simple? This one looks good for me:
LogicalProject(word=[$0])
  LogicalFilter(condition=[IS NULL($2)])
    LogicalJoin(condition=[=($0, $1)], joinType=[left])
      EnumerableTableScan(table=[[Words]])
      LogicalProject($f0=[$0], $f1=[true])
        LogicalProject(word=[$0])
          LogicalFilter(condition=[=($0, 'hello')])
            EnumerableTableScan(table=[[Words]])

And when I write a query using LEFT OUTER JOIN to receive this syntax tree - the optimization
works fine. And the query execution result is the same as must be in case of using NOT IN.
So am I wrong with a supposition about bad abstract syntax tree or not? I will be glad to
receive any comments.

Regards,
Alexander

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message