flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3738) Refactor TableEnvironment and TranslationContext
Date Wed, 13 Apr 2016 09:37:25 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238958#comment-15238958
] 

Fabian Hueske commented on FLINK-3738:
--------------------------------------

Hi [~yijieshen], that's a good observation. I would suggest to open a new JIRA for this issue.
FLINK-3632 is somewhat related to this as well.

In general, it would be good to validate as early as possible, ideally when the RelNodes are
constructed. This is not always possible with the current Table API. For instance, joins are
defined by {{join()}} and join predicates are later added with {{where()}}. ATM, we do only
allow equality joins for performance reasons but this can only be checked after optimization
and when the DataSet program is constructed. 

However, I think it should be possible to move more checks to the API level. So, it would
be good if you could open a JIRA (maybe with FLINK-3632 as a related or subissue) to refactor
the query validation.

> Refactor TableEnvironment and TranslationContext
> ------------------------------------------------
>
>                 Key: FLINK-3738
>                 URL: https://issues.apache.org/jira/browse/FLINK-3738
>             Project: Flink
>          Issue Type: Task
>          Components: Table API
>            Reporter: Fabian Hueske
>            Assignee: Fabian Hueske
>
> Currently the TableAPI uses a static object called {{TranslationContext}} which holds
the Calcite table catalog and a Calcite planner instance. Whenever a {{DataSet}} or {{DataStream}}
is converted into a {{Table}} or registered as a {{Table}} on the {{TableEnvironment}}, a
new entry is added to the catalog. The first time a {{Table}} is added, a planner instance
is created. The planner is used to optimize the query (defined by one or more Table API operations
and/or one ore more SQL queries) when a {{Table}} is converted into a {{DataSet}} or {{DataStream}}.
Since a planner may only be used to optimize a single program, the choice of a single static
object is problematic.
> I propose to refactor the {{TableEnvironment}} to take over the responsibility of holding
the catalog and the planner instance. 
> - A {{TableEnvironment}} holds a catalog of registered tables and a single planner instance.
> - A {{TableEnvironment}} will only allow to translate a single {{Table}} (possibly composed
of several Table API operations and SQL queries) into a {{DataSet}} or {{DataStream}}. 
> - A {{TableEnvironment}} is bound to an {{ExecutionEnvironment}} or a {{StreamExecutionEnvironment}}.
This is necessary to create data source or source functions to read external tables or streams.
> - {{DataSet}} and {{DataStream}} need a reference to a {{TableEnvironment}} to be converted
into a {{Table}}. This will prohibit implicit casts as currently supported for the DataSet
Scala API.
> - A {{Table}} needs a reference to the {{TableEnvironment}} it is bound to. Only tables
from the same {{TableEnvironment}} can be processed together.
> - The {{TranslationContext}} will be completely removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message