mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Dinić <marko.di...@nissatech.com>
Subject Re: Streaming K Means exception without any reason
Date Thu, 09 Oct 2014 13:18:29 GMT
Suneel,

Thank you for your answer, this was rather strange to me.

The number of points is 942. I have multiple runs, in each run I have a 
loop in which number of clusters is increased in each iteration and I 
multiple that number by 3, since I'm expecting log(n) initial 
centroids, before Ball K Means step. It's actually an attempt of elbow 
method implementation. It's very strange that this crashing happens 
occasionally.

Can I expect that problems like this be fixed in future? I'm using it 
since it gives better results, both in speed and clustering quality, 
but it would be a problem if it crashes like this.

On четвртак, 09. октобар 2014. 14:54:28 CEST, Suneel Marthi wrote:
> Seen this issue happen a few times before, there are few edge conditions
> that need to be fixed in the Streaming KMeans code and you are right that
> the generated clusters are different on successive runs given the same
> input.
>
> IIRC this stacktrace is due to BallKMeans failing to read any input
> centroids - can't recall the sequence that leads to this off the top of my
> head, will have to look.
>
> What's the size of ur input - the no. of points u r trying to cluster, how
> r u setting the value for ----estimatedNumMapClusters ?
> Streaming KMeans is still experimental and has scalability issues that need
> to be worked out.
>
> There are few other scenarios wherein Streaming KMeans fails that u should
> be aware of, see https://issues.apache.org/jira/browse/MAHOUT-1469.
>
> Lemme take a look at this.
>
>
>
> On Thu, Oct 9, 2014 at 5:39 AM, Marko Dinić <marko.dinic@nissatech.com>
> wrote:
>
>> Hello everyone,
>>
>> I'm using Mahout Streaming K Means multiple times in a loop, every time
>> for same input data, and output path is always different. Concretely, I'm
>> increasing number of clusters in each iteration. Currently it is run on a
>> single machine.
>>
>> A couple of times (maybe 3 of 20 runs) I get this exception
>>
>> Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.Merger$MergeQueue merge
>> INFO: Merging 1 sorted segments
>> Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.Merger$MergeQueue merge
>> INFO: Down to the last merge-pass, with 1 segments left of total size:
>> 1623 bytes
>> Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.LocalJobRunner$Job
>> statusUpdate
>> INFO:
>> Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> WARNING: job_local1196467414_0036
>> java.lang.NullPointerException
>>      at com.google.common.base.Preconditions.checkNotNull(
>> Preconditions.java:213)
>>      at org.apache.mahout.math.random.WeightedThing.<init>(
>> WeightedThing.java:31)
>>      at org.apache.mahout.math.neighborhood.ProjectionSearch.
>> searchFirst(ProjectionSearch.java:191)
>>      at org.apache.mahout.clustering.streaming.cluster.BallKMeans.
>> iterativeAssignment(BallKMeans.java:395)
>>      at org.apache.mahout.clustering.streaming.cluster.BallKMeans.
>> cluster(BallKMeans.java:208)
>>      at org.apache.mahout.clustering.streaming.mapreduce.
>> StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>>      at org.apache.mahout.clustering.streaming.mapreduce.
>> StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>>      at org.apache.mahout.clustering.streaming.mapreduce.
>> StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>>      at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
>> ReduceTask.java:649)
>>      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
>> LocalJobRunner.java:398)
>>
>> I'm running it like this:
>>
>> String[] args1 = new String[] {"-i",dataPath,"-o",
>> plusOneCentroids,"-k",String.valueOf(i+1), "--estimatedNumMapClusters",String.valueOf((i+1)*3),
>> "-ow"};
>>                          StreamingKMeansDriver.main(args1);
>>
>> I'm using the same configuration, and the same dataset, but I see no
>> reason why I get this exception, and it's even stranger that it doesn't
>> always occur.
>>
>> Any ideas?
>>
>> Thanks
>>
>

--
Pozdrav,
Marko Dinić

Mime
View raw message