spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burak Yavuz <brk...@gmail.com>
Subject Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time
Date Mon, 13 Jul 2015 18:14:44 GMT
Can you call repartition(8) or 16 on data.rdd(), before KMeans, and also,
.cache()?

something like, (I'm assuming you are using Java):
```
JavaRDD<Vector> input = data.repartition(8).cache();
org.apache.spark.mllib.clustering.KMeans.train(input.rdd(), 3, 20);
```

On Mon, Jul 13, 2015 at 11:10 AM, Nirmal Fernando <nirmal@wso2.com> wrote:

> I'm using;
>
> org.apache.spark.mllib.clustering.KMeans.train(data.rdd(), 3, 20);
>
> Cpu cores: 8 (using default Spark conf thought)
>
> On partitions, I'm not sure how to find that.
>
> On Mon, Jul 13, 2015 at 11:30 PM, Burak Yavuz <brkyvz@gmail.com> wrote:
>
>> What are the other parameters? Are you just setting k=3? What about # of
>> runs? How many partitions do you have? How many cores does your machine
>> have?
>>
>> Thanks,
>> Burak
>>
>> On Mon, Jul 13, 2015 at 10:57 AM, Nirmal Fernando <nirmal@wso2.com>
>> wrote:
>>
>>> Hi Burak,
>>>
>>> k = 3
>>> dimension = 785 features
>>> Spark 1.4
>>>
>>> On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz <brkyvz@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> How are you running K-Means? What is your k? What is the dimension of
>>>> your dataset (columns)? Which Spark version are you using?
>>>>
>>>> Thanks,
>>>> Burak
>>>>
>>>> On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nirmal@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of
>>>>> time (16+ mints).
>>>>>
>>>>> It takes lot of time at this task;
>>>>>
>>>>> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33)
>>>>> org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70)
>>>>>
>>>>> Can this be improved?
>>>>>
>>>>> --
>>>>>
>>>>> Thanks & regards,
>>>>> Nirmal
>>>>>
>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>> Mobile: +94715779733
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>

Mime
View raw message