flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1963) Improve distinct() transformation
Date Wed, 27 May 2015 22:40:17 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561882#comment-14561882
] 

Fabian Hueske commented on FLINK-1963:
--------------------------------------

Sure, assume you have a {{DataSet<Integer>}}, then Integer is an atomic type, i.e.,
it is not composed of other types. At the moment it is not possible to use the {{distinct}} transformation
to convert a data set such as {{\[1,2,2,1,3,5,3\]}} into {{\[1,2,3,5\]}}.

This should be possible in three ways to make it consistent with the remaining API features:

{code}
DataSet<Integer> myInts = ...

DataSet<Integer> myUniqueInt1 = myInts.distinct();
DataSet<Integer> myUniqueInt2 = myInts.distinct("*"); // "*" is a wildcard expression
(Java style) referring to the full type
DataSet<Integer> myUniqueInt3 = myInts.distinct("_"); // "_" is a wildcard expression
(Scala style) referring to the full type
{code}

This section of the [Flink documention|http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#specifying-keys]
about specifying keys might be interesting for you.

Let me know if you have further questions. 

> Improve distinct() transformation
> ---------------------------------
>
>                 Key: FLINK-1963
>                 URL: https://issues.apache.org/jira/browse/FLINK-1963
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>    Affects Versions: 0.9
>            Reporter: Fabian Hueske
>            Assignee: pietro pinoli
>            Priority: Minor
>              Labels: starter
>             Fix For: 0.9
>
>
> The `distinct()` transformation is a bit limited right now with respect to processing
atomic key types:
> - `distinct(String ...)` works only for composite data types (POJO, tuple), but wildcard
expression should also be supported for atomic key types
> - `distinct()` only works for composite types, but should also work for atomic key types
> - `distinct(KeySelector)` is the most generic one, but not very handy to use
> - `distinct(int ...)` works only for Tuple data types (which is fine)
> Fixing this should be rather easy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message