spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xi Shen <davidshe...@gmail.com>
Subject Re: Why k-means cluster hang for a long time?
Date Thu, 26 Mar 2015 22:48:56 GMT
Hi Burak,

My iterations is set to 500. But I think it should also stop of the
centroid coverages, right?

My spark is 1.2.0, working in windows 64 bit. My data set is about 40k
vectors, each vector has about 300 features, all normalised. All work node
have sufficient memory and disk space.

Thanks,
David

On Fri, 27 Mar 2015 02:48 Burak Yavuz <brkyvz@gmail.com> wrote:

> Hi David,
>
> When the number of runs are large and the data is not properly
> partitioned, it seems that K-Means is hanging according to my experience.
> Especially setting the number of runs to something high drastically
> increases the work in executors. If that's not the case, can you give more
> info on what Spark version you are using, your setup, and your dataset?
>
> Thanks,
> Burak
> On Mar 26, 2015 5:10 AM, "Xi Shen" <davidshen84@gmail.com> wrote:
>
>> Hi,
>>
>> When I run k-means cluster with Spark, I got this in the last two lines
>> in the log:
>>
>> 15/03/26 11:42:42 INFO spark.ContextCleaner: Cleaned broadcast 26
>> 15/03/26 11:42:42 INFO spark.ContextCleaner: Cleaned shuffle 5
>>
>>
>>
>> Then it hangs for a long time. There's no active job. The driver machine
>> is idle. I cannot access the work node, I am not sure if they are busy.
>>
>> I understand k-means may take a long time to finish. But why no active
>> job? no log?
>>
>>
>> Thanks,
>> David
>>
>>

Mime
View raw message