mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stevo Slavić <ssla...@gmail.com>
Subject Re: mahout kmeans not generating clusteredPoint dir?
Date Fri, 26 Jul 2013 22:26:32 GMT
Current Mahout examples cluster Reuters build has same issue:
https://builds.apache.org/user/sslavic/my-views/view/Mahout/job/Mahout-Examples-Cluster-Reuters/395/console

Kind regards,
Stevo Slavic.


On Wed, Jul 17, 2013 at 11:42 AM, Fuhrmann Alpert, Galit
<galpert@ebay.com>wrote:

>
> Thanks Suneel.
> I tried to add this flag (though I think clusteredPoints directory was
> supposed to be created by default?).
> Either way, for some reason whenever I add '-cl' (tried to run it on
> several data sets), I get the following error:
> "There is no queue named default"
> (even though I do specify a queue by -Dmapred.job.queue.name=...).
> I don't get this error otherwise.
>
> Has anyone ever encountered this error?
> Is there some sort of configuration I'm missing?
>
> Thanks,
>
> Galit.
>
> -----Original Message-----
> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
> Sent: Wednesday, July 10, 2013 5:30 PM
> To: user@mahout.apache.org
> Subject: Re: mahout kmeans not generating clusteredPoint dir?
>
> Been a while since I last worked with this, I believe u r missing the
> clustering option '-cl'.
> Give that a try.
>
>
>
>
> ________________________________
>  From: "Fuhrmann Alpert, Galit" <galpert@ebay.com>
> To: "user@mahout.apache.org" <user@mahout.apache.org>
> Sent: Wednesday, July 10, 2013 5:17 AM
> Subject: mahout kmeans not generating clusteredPoint dir?
>
>
> Hello,
>
> I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran
> successfully and created a directory containing clusters-*, including the
> last which was clusters-3-final.
> However, it did not create the clusteredPoints, or at least I cannot find
> it under the same dir (or anywhere else).
>
> My call was:
> mahout kmeans  -k 4000 -i inputSeq.dat -o outputPath --maxIter 3
> --clusters outputSeeds
>
> Was there an extra argument I needed to specify in order for it to
> generate the clusteredPoints?
> (BTW I also can't see the outputSeeds. Was it created for seeds and then
> deleted?)
>
> According to mahout in action:
>
> The k-means clustering implementation creates two types of directories in
> the output
> folder. The clusters-* directories are formed at the end of each
> iteration: the clusters-0
> directory is generated after the first iteration, clusters-1 after the
> second iteration, and
> so on. These directories contain information about the clusters: centroid,
> standard
> deviation, and so on. The clusteredPoints directory, on the other hand,
> contains the
> final mapping from cluster ID to document ID. This data is generated from
> the output
> of the last MapReduce operation.
> The directory listing of the output folder looks something like this:
> $ ls -l reuters-kmeans-clusters
> drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0
> drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-1
> drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-2
> ...
> drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint
>
> Again, my call did not generate the clusteredPoint directory.
> I would appreciate your help.
>
> Thanks a lot,
>
> Galit.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message