flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aljoscha Krettek (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6936) Add multiple targets support for custom partitioner
Date Tue, 27 Jun 2017 15:26:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064998#comment-16064998

Aljoscha Krettek commented on FLINK-6936:

[~xccui] I agree with you that having an extra {{KeySelector}} when using a custom {{Partitioner}}
seems unnecessary. The only purpose for having a {{KeySelector}} is when you want to use keyed
state, which you cannot use if you have a custom key selector.

Do you have a design document about how you want to approach the stream-stream theta join?
I would be especially interested in how you want to deal with fault-tolerance, i.e. how the
in-flight data is being buffered while waiting for events to join with.

> Add multiple targets support for custom partitioner
> ---------------------------------------------------
>                 Key: FLINK-6936
>                 URL: https://issues.apache.org/jira/browse/FLINK-6936
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Xingcan Cui
>            Assignee: Xingcan Cui
>            Priority: Minor
> The current user-facing Partitioner only allows returning one target.
> {code:java}
> @Public
> public interface Partitioner<K> extends java.io.Serializable, Function {
> 	/**
> 	 * Computes the partition for the given key.
> 	 *
> 	 * @param key The key.
> 	 * @param numPartitions The number of partitions to partition into.
> 	 * @return The partition index.
> 	 */
> 	int partition(K key, int numPartitions);
> }
> {code}
> Actually, this function should return multiple partitions and this may be a historical
> There could be at least three approaches to solve this.
> # Make the `protected DataStream<T> setConnectionType(StreamPartitioner<T>
partitioner)` method in DataStream public and that allows users to directly define StreamPartitioner.
> # Change the `partition` method in the Partitioner interface to return an int array instead
of a single int value.
> # Add a new `multicast` method to DataStream and provide a MultiPartitioner interface
which returns an int array.
> Considering the consistency of API, the 3rd approach seems to be an acceptable choice.
[~aljoscha], what do you think?

This message was sent by Atlassian JIRA

View raw message