mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <pat.fer...@gmail.com>
Subject Re: Mahout K-Means - Quality of the clusters
Date Tue, 20 May 2014 17:02:35 GMT
I looked at the code and the -e shouldn’t need a value. The null pointer is because of the
other params. Unfortunately the help doesn’t say which params are required. Looks like when
it runs evaluation it needs the clustered points and the distance measure should be the same
as you used to cluster.

      if (runEvaluation) {
        HadoopUtil.delete(conf, new Path("tmp/representative"));
        int numIters = 5;
        RepresentativePointsDriver.main(new String[]{
          "--input", seqFileDir.toString(),
          "--output", "tmp/representative",
          "--clusteredPoints", pointsDir.toString(),
          "--distanceMeasure", measure.getClass().getName(),
          "--maxIter", String.valueOf(numIters)
        });

I don’t have any example clusters right now so can’t run it myself.

On May 20, 2014, at 1:00 AM, hiroshi leon <hiroshi_8712@hotmail.com> wrote:

Thanks Pat and David,

I tried what you told me to do, but unfortunately is not working... I get the following error
when running the command:

./mahout clusterdump -i /user/Data-output/clusters-1-final -o analyze.txt --evaluate true

"ERROR common.AbstractJob: Unexpected true while processing Job-Specific Options:
Unexpected true while processing Job-Specific Options." 

According to the clusterdump help, it is not suppose to have any value in the parameter --evaluate
(-e), but if I do not
put anything I get the Java Null Pointer Exception.

These are 2 of the 23 clusters that are generated of my analyze.txt file, maybe it can help
to explain if there is something unexpected:

CL-0{n=113525 c=[10.821, 48.382, 66.019, 0.004, 0.000, 0.001, 0.000, 0.001, 0.001, 0.000,
0.000, 0.000, 0.000, 4.921, 8.565, 0.068, 0.068, 0.207, 0.205, 0.951, 0.052, 0.139, 209.864,
175.184, 0.731, 0.079, 0.119, 0.025, 0.069, 0.067, 0.191, 0.196] r=[91.194, 45.425, 78.914,
0.110, 0.008, 0.035, 0.028, 0.037, 0.038, 0.013, 0.008, 0.016, 0.011, 10.173, 23.152, 0.252,
0.252, 0.405, 0.403, 0.164, 0.195, 0.292, 80.182, 102.034, 0.395, 0.223, 0.290, 0.072, 0.251,
0.250, 0.381, 0.388]}

VL-1{n=17 c=[1.133, 0.669, 1.874, 1.460, 1.688, 1.818, 1.939, 1.255, 1.484, 1.697, 0.554,
1.042, 1.774, 0.818, 1.901, 1.522, 1.518, 1.098, 1.637, 1.611, 1.615, 1.212, 1.088, 1.133,
1.483, 0.761, 0.757, 0.953, 1.559, 1.696, 0.548, 0.975] r=[0.000, 0.000, 0.000, 0.000, NaN,
NaN, NaN, NaN, 0.000, 0.000, NaN, 0.000, 0.000, 0.000, NaN, 0.000, NaN, 0.000, NaN, 0.000,
0.000, 0.000, 0.000, 0.000, NaN, 0.000, NaN, 0.000, 0.000]}

Thanks!

> Subject: Re: Mahout K-Means - Quality of the clusters
> From: pat.ferrel@gmail.com
> Date: Mon, 19 May 2014 14:50:47 -0700
> To: user@mahout.apache.org; David.I.Noel@gmail.com
> 
> Yep, the clue is "--evaluate=null” in the console. try "-e true". I think I ran into
that a long time ago, it should really be fixed.
> 
> Try looking here for more explanation of cluster dump: https://mahout.apache.org/users/clustering/cluster-dumper.html
> 
> The docs are being greatly improved, so there's a chance you’ll find answers there.
> 
> On May 19, 2014, at 2:34 PM, David Noel <david.i.noel@gmail.com> wrote:
> 
> It works for me with just -e. Maybe try that or --evaluate true?
> 
> On 5/19/14, hiroshi leon <hiroshi_8712@hotmail.com> wrote:
>> Thanks Pat,
>> 
>> But how exactly can I run clusterdump using the -evaluate (-e) parameter?
>> When i try to run it for example:
>> 
>> ./mahout clusterdump -i /user/Data-output/clusters-1-final -o analyze.txt
>> --evaluate
>> 
>> I get a Java null pointer Exception
>> 
>> 14/05/19 15:02:03 INFO common.AbstractJob: Command line arguments:
>> {--dictionaryType=[text],
>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>> --endPhase=[2147483647], --evaluate=null,
>> --input=[/user/Data-output/clusters-1-final], --output=[analyze.txt],
>> --outputFormat=[TEXT], --startPhase=[0], --tempDir=[temp]}
>> Exception in thread "main" java.lang.NullPointerException
>> 
>> Do I have to put a parameter to evaluate? As input for clusterdump I am
>> using the output with the clusters after running mahout K-Means.
>> 
>>> Subject: Re: Mahout K-Means - Quality of the clusters
>>> From: pat.ferrel@gmail.com
>>> Date: Sat, 17 May 2014 09:43:59 -0700
>>> To: user@mahout.apache.org
>>> 
>>> mahout  clusterdump —evaluate …
>>> 
>>> provides some stats
>>> 
>>> On May 15, 2014, at 10:23 PM, hiroshi leon <hiroshi_8712@hotmail.com>
>>> wrote:
>>> 
>>> Hello everybody,
>>> 
>>> Do you know how can I get the MSE of the clusters in mahout K-Means?
>>> I would like to check the quality of the clusters. Thanks!
>>> 
>>> 		 	   		
>>> 
>> 
> 
		 	   		  


Mime
View raw message