flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Wysakowicz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2946) Add orderBy() to Table API
Date Fri, 01 Apr 2016 11:29:25 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221567#comment-15221567
] 

Dawid Wysakowicz commented on FLINK-2946:
-----------------------------------------

I still have some problems with range partitioning and parallelism. 

* First of all the {{org.apache.flink.api.java.DataSet}} that I get from {{translateToPlan}}
does not have the method getParallelism. But that's a minor issue.
* I am not sure how to extract the eventual parallelism of the input and if I need to do this.
Let's take this as example:

{code}
    val env = ExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val t = env.fromElements((1, 3, "Third"), (1, 2, "Fourth"), (1, 4, "Second"),
      (2, 1, "Sixth"), (1, 5, "First"), (1, 1, "Fifth")).setParallelism(4)
      .toTable.orderBy('_1.asc, '_2.desc)
{code}

The dataset then looks like(the numbers in brackets is parallelism of operator): DataSource(4)
-> MapOperator(-1) -> here I must apply either SortOperator or PartitionOperator ->
SortOperator.

On what parallelism shall I decide if the PartitionOperator should be applied? What should
be the parallelism of PartitionOperator?(By default it is the one from ExecutionEnvironment)

Hope I stated my problems clearly.

> Add orderBy() to Table API
> --------------------------
>
>                 Key: FLINK-2946
>                 URL: https://issues.apache.org/jira/browse/FLINK-2946
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API
>            Reporter: Timo Walther
>            Assignee: Dawid Wysakowicz
>
> In order to implement a FLINK-2099 prototype that uses the Table APIs code generation
facilities, the Table API needs a sorting feature.
> I would implement it the next days. Ideas how to implement such a sorting feature are
very welcome. Is there any more efficient way instead of {{.sortPartition(...).setParallism(1)}}?
Is it better to sort locally on the nodes first and finally sort on one node afterwards?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message