calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <>
Subject Re: Problems with abstract syntax tree
Date Mon, 31 Oct 2016 19:42:37 GMT
The behavior of NOT IN in SQL is complicated when there are NULL
values around. In particular, if one "word" value from the sub-query
is null, then the outer query must return 0 rows. (Why? Because "word
NOT IN ('foo', 'bar' null)" would evaluate to UNKNOWN for every row.)

It is valid to deduce that "word" in the sub-query is never null,
because of the "WHERE word = 'hello'" condition. I would have hoped
that a constant reduction could do that, and then maybe the CASE
expression can be simplified.

By the way, to be pedantic, what we are talking about here is the
RelNode tree, the relational algebra, which comes out of the
SqlToRelConverter. The AST is the SqlNode tree, which comes out of the
parser and goes into the SqlToRelConverter.

On Mon, Oct 31, 2016 at 8:46 AM, Alexander Shoshin
<> wrote:
> Hello, everybody.
> Trying to resolve an Apache Flink issue I got some troubles with Calcite. Can you help
me to understand is there a problem in Calcite or just in wrong settings passed to Calcite
> I have a simple table "Words" with one column named "word" and a query with NOT IN operator:
> val query = "SELECT word FROM Words WHERE word NOT IN (SELECT word FROM Words WHERE word
= 'hello')"
> This query parsed by org.apache.calcite.sql.parser.SqlParser.parseStmt() and then transformed
to a relational tree by org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(...).
> As a result I see the following abstract syntax tree
> LogicalProject(word=[$0])
>   LogicalFilter(condition=[NOT(CASE(=($1, 0), false, IS NOT NULL($5), true, IS NULL($3),
null, <($2, $1), null, false))])
>     LogicalJoin(condition=[=($3, $4)], joinType=[left])
>       LogicalProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$0])
>         LogicalJoin(condition=[true], joinType=[inner])
>           EnumerableTableScan(table=[[Words]])
>           LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>             LogicalProject($f0=[$0], $f1=[true])
>               LogicalProject(word=[$0])
>                 LogicalFilter(condition=[=($0, 'hello')])
>                   EnumerableTableScan(table=[[Words]])
>       LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>         LogicalProject($f0=[$0], $f1=[true])
>           LogicalProject(word=[$0])
>             LogicalFilter(condition=[=($0, 'hello')])
>               EnumerableTableScan(table=[[Words]])
> which fails later during query plan optimization (while calling
> I think it might be because of a very complex abstract syntax tree generated by Calcite.
Shouldn't it be more simple? This one looks good for me:
> LogicalProject(word=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
>     LogicalJoin(condition=[=($0, $1)], joinType=[left])
>       EnumerableTableScan(table=[[Words]])
>       LogicalProject($f0=[$0], $f1=[true])
>         LogicalProject(word=[$0])
>           LogicalFilter(condition=[=($0, 'hello')])
>             EnumerableTableScan(table=[[Words]])
> And when I write a query using LEFT OUTER JOIN to receive this syntax tree - the optimization
works fine. And the query execution result is the same as must be in case of using NOT IN.
So am I wrong with a supposition about bad abstract syntax tree or not? I will be glad to
receive any comments.
> Regards,
> Alexander

View raw message