flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolay Vasilishin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-4565) Support for SQL IN operator
Date Fri, 25 Nov 2016 16:38:59 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696107#comment-15696107
] 

Nikolay Vasilishin edited comment on FLINK-4565 at 11/25/16 4:38 PM:
---------------------------------------------------------------------

[~fhueske], I've finally opened the PR, [https://github.com/apache/flink/pull/2870].

And what's about subqueries.
As I've seen above, there are couple of ways to implement it: 
 - create table with inner join somewhere at the beginning
 - try to use calcite's IN operator, when constructing RexNode.

I tried to implement the second approach. It looks as it has to be ([example|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-7c7c5dd5a5723b84b8d45424f04b2be5R68]).
In that case we need to [overload in method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
with Table argument, then somehow construct proper rex node in [toRexNode method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-1b0f642e7f9b75bde5062b89b0b873e8R28]
[1] (now it constructs not-working plan, I'll show it below). Then it will be passed to [CodeGenerator|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-c0ee691580cf6752e6cb186ca4f0260dR980],
where I have to implement visitSubquery method. Actually, it's not hard to generate code for
all subquery's nodes, but I don't know, what code should be ganarated to link subquery with
the rest query.

RexNode constructed whis way has this logical plan:
{noformat}
LogicalFilter(condition=[IN(IN($2, {
LogicalProject(c=[$2])
  LogicalFilter(condition=[=($1, 6)])
    LogicalTableScan(table=[[_DataSetTable_0]])
}))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
There is duplicating IN call.

Logical plan generating for similar query in SQL API looks like this:
{noformat}
LogicalProject(a=[$0], c=[$2])
  LogicalJoin(condition=[=($1, $3)], joinType=[inner])
    LogicalTableScan(table=[[_DataSetTable_0]])
    LogicalAggregate(group=[{0}])
      LogicalProject(b=[$1])
        LogicalFilter(condition=[AND(>=($1, 6), <=($1, 9))])
          LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
So, it's implemented via the first approach.

In the first approach it's not clear for me where we will get reference on first (left) table,
as we invoke IN method on expressions like 'column. But I didn't thought about it well yet.

[1] I'm sorry, I forgot to change code to this:
{noformat}
val in: RexSubQuery = RexSubQuery.in(table.getRelNode, new ImmutableList.Builder[RexNode]().add(children.map(_.toRexNode):
_*).build())
relBuilder.call(SqlStdOperatorTable.IN, in)
{noformat}
In this case there will be generated plan shown above


was (Author: nvasilishin):
[~fhueske], I've finally opened the PR, [https://github.com/apache/flink/pull/2870].

And what's about subqueries.
As I've seen above, there are couple of ways to implement it: 
 - create table with inner join somewhere at the beginning
 - try to use calcite's IN operator, when constructing RexNode.

I tried to implement the second approach. It looks as it has to be ([example|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-7c7c5dd5a5723b84b8d45424f04b2be5R68]).
In that case we need to [overload in method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-04d1bca648d7ee47ab9ce787c8d944a6R108]
with Table argument, then somehow construct proper rex node in [toRexNode method|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-1b0f642e7f9b75bde5062b89b0b873e8R28]
[1] (now it constructs not-working plan, I'll show it below). Then it will be passed to [CodeGenerator|https://github.com/NickolayVasilishin/flink/commit/792a10440eede233260d3b2157fbd93af9e3572f#diff-c0ee691580cf6752e6cb186ca4f0260dR980],
where I have to implement visitSubquery method. Actually, it's not hard to generate code for
all subquery's nodes, but I don't know, what code should be ganarated to link subquery with
the rest query.

RexNode constructed whis way has this logical plan:
{noformat}
LogicalFilter(condition=[IN(IN($2, {
LogicalProject(c=[$2])
  LogicalFilter(condition=[=($1, 6)])
    LogicalTableScan(table=[[_DataSetTable_0]])
}))])
  LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
There is duplicating IN call.

Logical plan generating for similar query in SQL API looks like this:
{noformat}
LogicalProject(a=[$0], c=[$2])
  LogicalJoin(condition=[=($1, $3)], joinType=[inner])
    LogicalTableScan(table=[[_DataSetTable_0]])
    LogicalAggregate(group=[{0}])
      LogicalProject(b=[$1])
        LogicalFilter(condition=[AND(>=($1, 6), <=($1, 9))])
          LogicalTableScan(table=[[_DataSetTable_0]])
{noformat}
So, it's implemented via the first approach.

In the first approach it's not clear for me where we will get reference on first (left) table,
as we invoke IN method on expressions like 'column. But I didn't thought about it well yet.

[1] I'm sorry, I forgot to change code to this:
{noformat}
val in: RexSubQuery = RexSubQuery.in(table.getRelNode, new ImmutableList.Builder[RexNode]().add(children.map(_.toRexNode):
_*).build())
relBuilder.call(SqlStdOperatorTable.IN, in)
{noformat}
In this case there will be generated plan shown above

> Support for SQL IN operator
> ---------------------------
>
>                 Key: FLINK-4565
>                 URL: https://issues.apache.org/jira/browse/FLINK-4565
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Timo Walther
>            Assignee: Nikolay Vasilishin
>
> It seems that Flink SQL supports the uncorrelated sub-query IN operator. But it should
also be available in the Table API and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message