mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <dar...@ontrenet.com>
Subject Complete canopy example?
Date Mon, 07 Mar 2011 16:57:03 GMT

Hi,
  I have a directory of text documents I want to do canopy clustering with
(mahout 0.4 standalone/no hadoop).
I'm having some difficulty doing this. Is there a complete example with
every step?

Here is what I do:

Step 1$ ./bin/mahout seqdirectory -i INPUT_FILES/ -o FEED_SEQ  -c UTF-8
-chunk 5

# My INPUT_FILES contains 1000 text files, yet the output FEED_SEQ
contains only 1 tiny chunk with a file in it. Is that right?

Step 2$ ./bin/mahout seq2sparse -i FEED_SEQ -o FEED_VEC  --maxNGramSize 3

# This seems to generate a bit of output. no errors

Step 3$ ./bin/mahout canopy -i FEED_VEC -o FEED_CENTS -t1 1500 -t2 2000

Exception in thread "main" java.io.FileNotFoundException: File
file:/home/darren/Downloads/mahout-distribution-0.4/FEED_VEC/tokenized-documents/data
does not exist.

----

Step 1 output is suspicious to me: 

$ ./bin/mahout seqdirectory -i INPUT_FILES/ -o FEED_SEQ  -c UTF-8 -chunk 5
no HADOOP_HOME set, running locally
Mar 7, 2011 11:57:14 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 847 ms

----

Darren

Mime
View raw message