mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hiroshi leon <hiroshi_8...@hotmail.com>
Subject RE: Mahout K-Means - Quality of the clusters
Date Tue, 20 May 2014 08:00:55 GMT
Thanks Pat and David,

I tried what you told me to do, but unfortunately is not working... I get the following error
when running the command:

./mahout clusterdump -i /user/Data-output/clusters-1-final -o analyze.txt --evaluate true

"ERROR common.AbstractJob: Unexpected true while processing Job-Specific Options:
Unexpected true while processing Job-Specific Options." 

According to the clusterdump help, it is not suppose to have any value in the parameter --evaluate
(-e), but if I do not
put anything I get the Java Null Pointer Exception.

These are 2 of the 23 clusters that are generated of my analyze.txt file, maybe it can help
to explain if there is something unexpected:

CL-0{n=113525 c=[10.821, 48.382, 66.019, 0.004, 0.000, 0.001, 0.000, 0.001, 0.001, 0.000,
0.000, 0.000, 0.000, 4.921, 8.565, 0.068, 0.068, 0.207, 0.205, 0.951, 0.052, 0.139, 209.864,
175.184, 0.731, 0.079, 0.119, 0.025, 0.069, 0.067, 0.191, 0.196] r=[91.194, 45.425, 78.914,
0.110, 0.008, 0.035, 0.028, 0.037, 0.038, 0.013, 0.008, 0.016, 0.011, 10.173, 23.152, 0.252,
0.252, 0.405, 0.403, 0.164, 0.195, 0.292, 80.182, 102.034, 0.395, 0.223, 0.290, 0.072, 0.251,
0.250, 0.381, 0.388]}

VL-1{n=17 c=[1.133, 0.669, 1.874, 1.460, 1.688, 1.818, 1.939, 1.255, 1.484, 1.697, 0.554,
1.042, 1.774, 0.818, 1.901, 1.522, 1.518, 1.098, 1.637, 1.611, 1.615, 1.212, 1.088, 1.133,
1.483, 0.761, 0.757, 0.953, 1.559, 1.696, 0.548, 0.975] r=[0.000, 0.000, 0.000, 0.000, NaN,
NaN, NaN, NaN, 0.000, 0.000, NaN, 0.000, 0.000, 0.000, NaN, 0.000, NaN, 0.000, NaN, 0.000,
0.000, 0.000, 0.000, 0.000, NaN, 0.000, NaN, 0.000, 0.000]}
 
Thanks!

> Subject: Re: Mahout K-Means - Quality of the clusters
> From: pat.ferrel@gmail.com
> Date: Mon, 19 May 2014 14:50:47 -0700
> To: user@mahout.apache.org; David.I.Noel@gmail.com
> 
> Yep, the clue is "--evaluate=null” in the console. try "-e true". I think I ran into
that a long time ago, it should really be fixed.
> 
> Try looking here for more explanation of cluster dump: https://mahout.apache.org/users/clustering/cluster-dumper.html
> 
> The docs are being greatly improved, so there's a chance you’ll find answers there.
> 
> On May 19, 2014, at 2:34 PM, David Noel <david.i.noel@gmail.com> wrote:
> 
> It works for me with just -e. Maybe try that or --evaluate true?
> 
> On 5/19/14, hiroshi leon <hiroshi_8712@hotmail.com> wrote:
> > Thanks Pat,
> > 
> > But how exactly can I run clusterdump using the -evaluate (-e) parameter?
> > When i try to run it for example:
> > 
> > ./mahout clusterdump -i /user/Data-output/clusters-1-final -o analyze.txt
> > --evaluate
> > 
> > I get a Java null pointer Exception
> > 
> > 14/05/19 15:02:03 INFO common.AbstractJob: Command line arguments:
> > {--dictionaryType=[text],
> > --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> > --endPhase=[2147483647], --evaluate=null,
> > --input=[/user/Data-output/clusters-1-final], --output=[analyze.txt],
> > --outputFormat=[TEXT], --startPhase=[0], --tempDir=[temp]}
> > Exception in thread "main" java.lang.NullPointerException
> > 
> > Do I have to put a parameter to evaluate? As input for clusterdump I am
> > using the output with the clusters after running mahout K-Means.
> > 
> >> Subject: Re: Mahout K-Means - Quality of the clusters
> >> From: pat.ferrel@gmail.com
> >> Date: Sat, 17 May 2014 09:43:59 -0700
> >> To: user@mahout.apache.org
> >> 
> >> mahout  clusterdump —evaluate …
> >> 
> >> provides some stats
> >> 
> >> On May 15, 2014, at 10:23 PM, hiroshi leon <hiroshi_8712@hotmail.com>
> >> wrote:
> >> 
> >> Hello everybody,
> >> 
> >> Do you know how can I get the MSE of the clusters in mahout K-Means?
> >> I would like to check the quality of the clusters. Thanks!
> >> 
> >> 		 	   		
> >> 
> > 
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message