flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schrott (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
Date Thu, 07 May 2015 13:36:59 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532619#comment-14532619
] 

Peter Schrott commented on FLINK-1731:
--------------------------------------

For the implementation of the Kmeans algorithm there are some basic operations for the org.apache.flink.ml.math.Vecor
missing. (add, euclideanDistance, div). Are these to implement or is it more recommendable
to use BreezeVector? 

> Add kMeans clustering algorithm to machine learning library
> -----------------------------------------------------------
>
>                 Key: FLINK-1731
>                 URL: https://issues.apache.org/jira/browse/FLINK-1731
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Alexander Alexandrov
>              Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not yet ported
to the machine learning library. I assume that only the used data types have to be adapted
and then it can be more or less directly moved to flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because
the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile
to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message