mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geek Gamer <geek4...@gmail.com>
Subject Re: Check the input files present in cluster
Date Wed, 06 Apr 2011 06:42:57 GMT
How are you preparing the vectors? You will get the cluster members if these
are named vectors. you can prepare named vectors from a sequence file using
$MAHOUT_HOME/bin/mahout seq2sparse

add the parameter --namedVector to the command to create named vectors, the
same clusterdump command will then yield the members of the clusters.
Hope this helped.


On Wed, Apr 6, 2011 at 9:23 AM, Madhusudan Joshi <madhusudanrjoshi@gmail.com
> wrote:

> The command I used to cluster dump is
>
> mahout clusterdump -s mytest/kmeans/clusters-1 -p
> mytest/kmeans/clusteredPoints -d mytest/seqdir-sparse/dictionary.file-0 -dt
> sequencefile -n 20 -o Desktop/ClusterDump/Kmeans/cl1.txt
>
> I tried the reuters example and then clustered using my sample files. The
> output of my sample files is
>
> CL-0{n=2 c=[article:3.009, first:3.279, third:3.279] r=[first:3.279,
> third:3.279]}
>    Top Terms:
>        third                                   =>  3.2787654399871826
>        first                                   =>  3.2787654399871826
>        article                                 =>  3.0087521076202393
>    Weight:  Point:
>    1.0: [article:3.009, first:6.558]
>    1.0: [article:3.009, third:6.558]
> VL-1{n=1 c=[article:3.009, second:6.558] r=[article:0.000, first:0.000,
> fourth:0.000, second:0.000, third:0.000]}
>    Top Terms:
>        second                                  =>   6.557530879974365
>        article                                 =>  3.0087521076202393
>    Weight:  Point:
>    1.0: [article:3.009, second:6.558]
> VL-3{n=1 c=[article:3.009, fourth:6.558] r=[article:0.000, first:0.000,
> fourth:0.000, second:0.000, third:0.000]}
>    Top Terms:
>        fourth                                  =>   6.557530879974365
>        article                                 =>  3.0087521076202393
>    Weight:  Point:
>    1.0: [article:3.009, fourth:6.558]
>
> The output showed the number of documents present in the cluster but did
> not
> mention which documents. I need to be able to check which documents are
> present in any given clusters.
>
> On Tue, Apr 5, 2011 at 11:34 PM, Jeff Eastman <jeastman@narus.com> wrote:
>
> > You are going to have to be much more explicit in terms of what command
> > line invocations you did and what results you got in order for anybody to
> be
> > able help you much here. Have you tried the clustering examples in the
> wiki?
> >
> > -----Original Message-----
> > From: Madhusudan Joshi [mailto:madhusudanrjoshi@gmail.com]
> > Sent: Monday, April 04, 2011 10:23 PM
> > To: user@mahout.apache.org
> > Subject: Check the input files present in cluster
> >
> > Hi,
> >
> > I am new to mahout and trying out clustering. I created a cluster using
> > kmeans in bash. I want to know which files are present in a given
> clusters.
> > I tried looking for it in cluster dumper but didn't find the required
> > solution. Can anyone help me with this?
> >
> > Thanks.
> >
> > --
> > Everything we hear is an opinion, not a fact.
> > Everything we see is perspective, not the truth.
> >
>
>
>
> --
> Everything we hear is an opinion, not a fact.
> Everything we see is perspective, not the truth.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message