spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirmal Fernando <nir...@wso2.com>
Subject Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time
Date Mon, 13 Jul 2015 17:57:18 GMT
Hi Burak,

k = 3
dimension = 785 features
Spark 1.4

On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz <brkyvz@gmail.com> wrote:

> Hi,
>
> How are you running K-Means? What is your k? What is the dimension of your
> dataset (columns)? Which Spark version are you using?
>
> Thanks,
> Burak
>
> On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nirmal@wso2.com> wrote:
>
>> Hi,
>>
>> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of
>> time (16+ mints).
>>
>> It takes lot of time at this task;
>>
>> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33)
>> org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70)
>>
>> Can this be improved?
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>


-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

Mime
View raw message