flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Piotr Nowojski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10474) Don't translate IN to JOIN with VALUES for streaming queries
Date Tue, 09 Oct 2018 06:46:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642878#comment-16642878
] 

Piotr Nowojski commented on FLINK-10474:
----------------------------------------

Sorry for jumping a bit late to the discussion, but I would like to point out couple of drawbacks
of the 1. approach:
 # it's less general. 2nd option would/could cover more cases like: IN queries with bounded
table (not values), JOINS with bounded tables (or values). JOINS with bounded are something
that is being asked by the users and is something that we would like to have. If we go now
with 1. approach, it will be a wasted effort after implementing bounded JOINS. 
 # It adds complexity. Despite it being maybe easier to implement, it doesn't add new features
to the Flink, while increasing code complexity by adding some code to handle only a special
case.
 # It will complicate planning logic and will more diverge streaming plans from the batch.
This again will rise the complexity of the project (more moving parts, more things one have
to consider when analysing planning results).

> Don't translate IN to JOIN with VALUES for streaming queries
> ------------------------------------------------------------
>
>                 Key: FLINK-10474
>                 URL: https://issues.apache.org/jira/browse/FLINK-10474
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API &amp; SQL
>    Affects Versions: 1.6.1, 1.7.0
>            Reporter: Fabian Hueske
>            Assignee: Hequn Cheng
>            Priority: Major
>              Labels: pull-request-available
>
> IN clauses are translated to JOIN with VALUES if the number of elements in the IN clause
exceeds a certain threshold. This should not be done, because a streaming join is very heavy
and materializes both inputs (which is fine for the VALUES) input but not for the other.
> There are two ways to solve this:
>  # don't translate IN to a JOIN at all
>  # translate it to a JOIN but have a special join strategy if one input is bound and
final (non-updating)
> Option 1. should be easy to do, option 2. requires much more effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message