flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders
Date Thu, 07 Apr 2016 15:42:25 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230430#comment-15230430
] 

ASF GitHub Bot commented on FLINK-3665:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1848#discussion_r58894245
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/operators/base/PartitionOperatorBase.java
---
    @@ -51,12 +53,19 @@
     	private Partitioner<?> customPartitioner;
     	
     	private DataDistribution distribution;
    +
    +	private Ordering ordering;
     	
     	
     	public PartitionOperatorBase(UnaryOperatorInformation<IN, IN> operatorInfo, PartitionMethod
pMethod, int[] keys, String name) {
     		super(new UserCodeObjectWrapper<NoOpFunction>(new NoOpFunction()), operatorInfo,
keys, name);
     		this.partitionMethod = pMethod;
     	}
    +
    +	public PartitionOperatorBase(UnaryOperatorInformation<IN, IN> operatorInfo, PartitionMethod
pMethod, int[] keys, Order[] orders, String name) {
    --- End diff --
    
    I think this constructor is not used. So it can be removed. 
    Also `orders` is not set in this constructor. 


> Range partitioning lacks support to define sort orders
> ------------------------------------------------------
>
>                 Key: FLINK-3665
>                 URL: https://issues.apache.org/jira/browse/FLINK-3665
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataSet API
>    Affects Versions: 1.0.0
>            Reporter: Fabian Hueske
>             Fix For: 1.1.0
>
>
> {{DataSet.partitionByRange()}} does not allow to specify the sort order of fields. This
is fine if range partitioning is used to reduce skewed partitioning. 
> However, it is not sufficient if range partitioning is used to sort a data set in parallel.

> Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily changed,
I propose to add a method {{withOrders(Order... orders)}} to {{PartitionOperator}}. The method
should throw an exception if the partitioning method of {{PartitionOperator}} is not range
partitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message