mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuhrmann Alpert, Galit" <galp...@ebay.com>
Subject RE: mahout kmeans not generating clusteredPoint dir?
Date Mon, 29 Jul 2013 06:49:27 GMT

Thanks. Was there any fix to this? Or is this an open issues?

-----Original Message-----
From: Stevo Slavić [mailto:sslavic@gmail.com] 
Sent: Saturday, July 27, 2013 1:27 AM
To: user@mahout.apache.org
Cc: Suneel Marthi
Subject: Re: mahout kmeans not generating clusteredPoint dir?

Current Mahout examples cluster Reuters build has same issue:
https://builds.apache.org/user/sslavic/my-views/view/Mahout/job/Mahout-Examples-Cluster-Reuters/395/console

Kind regards,
Stevo Slavic.


On Wed, Jul 17, 2013 at 11:42 AM, Fuhrmann Alpert, Galit
<galpert@ebay.com>wrote:

>
> Thanks Suneel.
> I tried to add this flag (though I think clusteredPoints directory was 
> supposed to be created by default?).
> Either way, for some reason whenever I add '-cl' (tried to run it on 
> several data sets), I get the following error:
> "There is no queue named default"
> (even though I do specify a queue by -Dmapred.job.queue.name=...).
> I don't get this error otherwise.
>
> Has anyone ever encountered this error?
> Is there some sort of configuration I'm missing?
>
> Thanks,
>
> Galit.
>
> -----Original Message-----
> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
> Sent: Wednesday, July 10, 2013 5:30 PM
> To: user@mahout.apache.org
> Subject: Re: mahout kmeans not generating clusteredPoint dir?
>
> Been a while since I last worked with this, I believe u r missing the 
> clustering option '-cl'.
> Give that a try.
>
>
>
>
> ________________________________
>  From: "Fuhrmann Alpert, Galit" <galpert@ebay.com>
> To: "user@mahout.apache.org" <user@mahout.apache.org>
> Sent: Wednesday, July 10, 2013 5:17 AM
> Subject: mahout kmeans not generating clusteredPoint dir?
>
>
> Hello,
>
> I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran 
> successfully and created a directory containing clusters-*, including 
> the last which was clusters-3-final.
> However, it did not create the clusteredPoints, or at least I cannot 
> find it under the same dir (or anywhere else).
>
> My call was:
> mahout kmeans  -k 4000 -i inputSeq.dat -o outputPath --maxIter 3 
> --clusters outputSeeds
>
> Was there an extra argument I needed to specify in order for it to 
> generate the clusteredPoints?
> (BTW I also can't see the outputSeeds. Was it created for seeds and 
> then
> deleted?)
>
> According to mahout in action:
>
> The k-means clustering implementation creates two types of directories 
> in the output folder. The clusters-* directories are formed at the end 
> of each
> iteration: the clusters-0
> directory is generated after the first iteration, clusters-1 after the 
> second iteration, and so on. These directories contain information 
> about the clusters: centroid, standard deviation, and so on. The 
> clusteredPoints directory, on the other hand, contains the final 
> mapping from cluster ID to document ID. This data is generated from 
> the output of the last MapReduce operation.
> The directory listing of the output folder looks something like this:
> $ ls -l reuters-kmeans-clusters
> drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0 drwxr-xr-x 4 user 
> 5000 136 Feb 1 18:56 clusters-1 drwxr-xr-x 4 user 5000 136 Feb 1 18:56 
> clusters-2 ...
> drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint
>
> Again, my call did not generate the clusteredPoint directory.
> I would appreciate your help.
>
> Thanks a lot,
>
> Galit.
>

Mime
View raw message