mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen, Ronald L." <allen...@ornl.gov>
Subject RE: seqdumper output?
Date Fri, 21 Feb 2014 15:41:19 GMT
Hey again,

I was able to figure out a way to get my CSV file clustered.  For now it is a very rough process.
 I will refine the steps I took and post what I did on the list hopefully in a week or so.

Thanks for all the help!

Ronald
________________________________________
From: Suneel Marthi [suneel_marthi@yahoo.com]
Sent: Tuesday, February 11, 2014 5:44 PM
To: user@mahout.apache.org
Subject: Re: seqdumper output?

You should run the clusterdump on /home/r9r/seqTest/seqKmeans/clusters-1-final/part-xxxxx
to see the points that are in the cluster.
But u need a dictionary for that which wouldn't be available if the vectors were generated
from CSV.

So one way to generate a dictionary for a CSV and verify the clustering output would be to
go through the below process :-

1. Convert CSV file to a lucene index (see http://glaforge.appspot.com/article/lucene-s-fun
for sample code).
2. Run the lucene index from (1) through Mahout's lucene2seq utility - this converts the lucene
indexes into sequencefiles
3. Run the output of (2) thru seq2sparse - this should generate tf-idf vectors, dictionary,
tf-vectors, wordcounts
4. Run the output of (3) thru KMeans Driver.

Please give this a try.





On Tuesday, February 11, 2014 3:33 PM, "Allen, Ronald L." <allenrl1@ornl.gov> wrote:

Hello,

I have done something wrong with
 clustering a CSV file and can't quite figure it out.  I am using Mahout 0.9 on a local machine
only.  Below is the output from seqdumper, and I am not sure how to interpret it.  Can anyone
help?

Input Path: file:/home/r9r/seqTest/seqKmeans/clusters-1-final/_policy
Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.mahout.clustering.iterator.ClusteringPolicyWritable
Key: : Value: org.apache.mahout.clustering.iterator.ClusteringPolicyWritable@78be9eb3
Count: 1
Input Path: file:/home/r9r/seqTest/seqKmeans/clusters-1-final/part-00000
Key class:
 class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.iterator.ClusterWritable
Key: 0: Value: org.apache.mahout.clustering.iterator.ClusterWritable@592ea0f8
Count: 1
Input Path: file:/home/r9r/seqTest/seqKmeans/clusters-1-final/part-00001
Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.iterator.ClusterWritable
Key: 1: Value: org.apache.mahout.clustering.iterator.ClusterWritable@44a2786
Count: 1

There's probably a good chance I am still not getting my CSV data into something usable. 
I can get it into a sequence file, but this is the output.

Thanks,
Ronald

Mime
View raw message