mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel.mar...@gmail.com>
Subject Re: Streaming K Means exception without any reason
Date Thu, 09 Oct 2014 12:54:28 GMT
Seen this issue happen a few times before, there are few edge conditions
that need to be fixed in the Streaming KMeans code and you are right that
the generated clusters are different on successive runs given the same
input.

IIRC this stacktrace is due to BallKMeans failing to read any input
centroids - can't recall the sequence that leads to this off the top of my
head, will have to look.

What's the size of ur input - the no. of points u r trying to cluster, how
r u setting the value for ----estimatedNumMapClusters ?
Streaming KMeans is still experimental and has scalability issues that need
to be worked out.

There are few other scenarios wherein Streaming KMeans fails that u should
be aware of, see https://issues.apache.org/jira/browse/MAHOUT-1469.

Lemme take a look at this.



On Thu, Oct 9, 2014 at 5:39 AM, Marko Dinić <marko.dinic@nissatech.com>
wrote:

> Hello everyone,
>
> I'm using Mahout Streaming K Means multiple times in a loop, every time
> for same input data, and output path is always different. Concretely, I'm
> increasing number of clusters in each iteration. Currently it is run on a
> single machine.
>
> A couple of times (maybe 3 of 20 runs) I get this exception
>
> Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.Merger$MergeQueue merge
> INFO: Merging 1 sorted segments
> Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.Merger$MergeQueue merge
> INFO: Down to the last merge-pass, with 1 segments left of total size:
> 1623 bytes
> Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.LocalJobRunner$Job
> statusUpdate
> INFO:
> Oct 09, 2014 11:30:40 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local1196467414_0036
> java.lang.NullPointerException
>     at com.google.common.base.Preconditions.checkNotNull(
> Preconditions.java:213)
>     at org.apache.mahout.math.random.WeightedThing.<init>(
> WeightedThing.java:31)
>     at org.apache.mahout.math.neighborhood.ProjectionSearch.
> searchFirst(ProjectionSearch.java:191)
>     at org.apache.mahout.clustering.streaming.cluster.BallKMeans.
> iterativeAssignment(BallKMeans.java:395)
>     at org.apache.mahout.clustering.streaming.cluster.BallKMeans.
> cluster(BallKMeans.java:208)
>     at org.apache.mahout.clustering.streaming.mapreduce.
> StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>     at org.apache.mahout.clustering.streaming.mapreduce.
> StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>     at org.apache.mahout.clustering.streaming.mapreduce.
> StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>     at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> ReduceTask.java:649)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>     at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> LocalJobRunner.java:398)
>
> I'm running it like this:
>
> String[] args1 = new String[] {"-i",dataPath,"-o",
> plusOneCentroids,"-k",String.valueOf(i+1), "--estimatedNumMapClusters",String.valueOf((i+1)*3),
> "-ow"};
>                         StreamingKMeansDriver.main(args1);
>
> I'm using the same configuration, and the same dataset, but I see no
> reason why I get this exception, and it's even stranger that it doesn't
> always occur.
>
> Any ideas?
>
> Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message