flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4565) Support for SQL IN operator
Date Thu, 10 Nov 2016 22:23:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655352#comment-15655352

Fabian Hueske commented on FLINK-4565:

Hi [~nvasilishin] and [~jark], I had at look how Calcite handles SQL queries with an {{x IN
(<list-of-literal>)}} predicate and how these are executed by the Table API.

This is what I found:

1. if the list has fewer entries than 20, the predicate is translated into disjunctive equality
predicates: {{(x = <lit-1> OR x = <lit-2> OR x = <lit3> ...)}}
2. if the list has 20 or more entries, the predicate is translated into a values-table ->
distinct -> equi-join (not sure why distinct is not done at planning time...). If {{IN}}
is the only predicate of the {{WHERE}} clause or if it is in a conjunctive condition {{y =
10 AND x IN (...)}} the join is an inner join. If {{IN}} is in an disjunctive condition {{y
= 10 OR x IN (...)}}, the join is a left outer join and some additional predicates to check
for {{NULL}} are added.

Unfortunately, the translation (and case distinction) is not done by optimizer rules but when
the validator generates the RelNodes from the SqlNodes. 
So for the Table API we would need to implement this logic ourselves, i.e, when LogicalNodes
are translated into RelNodes.

I would suggest to split this issue up into three subissues:

1) Support for IN with less than 20 literals. If we follow Calcite's approach of many disjunctive
equal predicates, we do not need to handle hashsets or any other special runtime code. Although
this would be faster than (up to) 19 checks, I have the feeling that the added code complexity
does not pay of the performance gain (esp. since I am not sure how common IN with 19 literals
2) Support for IN with 20 or more literals. This translation needs a bit of thought to make
sure that all cases are covered (no incorrect result, but possibly rejected queries).
3) Support for IN with subqueries / tables. This should be executed similar to the 20+ literal
case, but instead of a values table the subquery should be used as input.

What do you think?

> Support for SQL IN operator
> ---------------------------
>                 Key: FLINK-4565
>                 URL: https://issues.apache.org/jira/browse/FLINK-4565
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Timo Walther
>            Assignee: Nikolay Vasilishin
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But it should
also be available in the Table API and tested.

This message was sent by Atlassian JIRA

View raw message