spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirmal Fernando <nir...@wso2.com>
Subject [MLLib][Kmeans] KMeansModel.computeCost takes lot of time
Date Mon, 13 Jul 2015 09:53:08 GMT
Hi,

For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of time
(16+ mints).

It takes lot of time at this task;

org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33)
org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70)

Can this be improved?

-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

Mime
View raw message