spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuhao Yang <>
Subject Re: [MLlib] kmeans random initialization, same seed every time
Date Wed, 15 Mar 2017 06:42:11 GMT
Hi Julian,

Thanks for reporting this. This is a valid issue and I created to track it.

Right now the seed is set to this.getClass.getName.hashCode.toLong by
default, which indeed keeps the same among multiple fits. Feel free to
leave your comments or send a PR for the fix.

For your problem, you may add .setSeed(new Random().nextLong()) before
fit() as a workaround.


2017-03-14 5:46 GMT-07:00 Julian Keppel <>:

> I'm sorry, I missed some important informations. I use Spark version 2.0.2
> in Scala 2.11.8.
> 2017-03-14 13:44 GMT+01:00 Julian Keppel <>:
>> Hi everybody,
>> I make some experiments with the Spark kmeans implementation of the new
>> DataFrame-API. I compare clustering results of different runs with
>> different parameters. I recognized that for random initialization mode, the
>> seed value is the same every time. How is it calculated? In my
>> understanding the seed should be random if it is not provided by the user.
>> Thank you for you help.
>> Julian

View raw message