flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Theodore Vasiloudis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
Date Thu, 14 May 2015 18:34:59 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544164#comment-14544164
] 

Theodore Vasiloudis commented on FLINK-1731:
--------------------------------------------

Yeah that might be the better option. The optimization framework is more developer oriented,
but since Kmeans is mostly aimed at practitioners it would be better to abstract away the
complexity.

> Add kMeans clustering algorithm to machine learning library
> -----------------------------------------------------------
>
>                 Key: FLINK-1731
>                 URL: https://issues.apache.org/jira/browse/FLINK-1731
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Peter Schrott
>              Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not yet ported
to the machine learning library. I assume that only the used data types have to be adapted
and then it can be more or less directly moved to flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because
the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile
to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message