spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon NANTY <>
Subject Possible contribution to MLlib
Date Tue, 21 Jun 2016 11:48:09 GMT
Hi all,

In my team, we are currently developing a fork of spark MLlib extending K-means method such
that it is possible to set its own distance function. In this implementation, it could be
possible to directly pass, in argument of the K-means train function, a distance function
whose signature is: (VectorWithNorm, VectorWithNorm) => Double.

We have found the Jira instance SPARK-11665 proposing to support new distances in bisecting
K-means. There has also been the Jira instance SPARK-3219 proposing to add Bregman divergences
as distance functions, but it has not been added to MLlib. Therefore, we are wondering if
such an extension of MLlib K-means algorithm would be appreciated by the community and would
have chances to get included in future spark releases.


Simon Nanty

View raw message