mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben West <bwsithspaw...@yahoo.com>
Subject LDA question
Date Mon, 05 Sep 2011 15:38:05 GMT
Hey all,

I'm trying the Latent Dirichlet Allocation operator. I made my term vectors as specified here:
https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html with these commands:


~/Scripts/Mahout/trunk/bin/mahout seqdirectory --input /home/ben/Scripts/eipi/files --output
/home/ben/Scripts/eipi/mahout_out -chunk 1
~/Scripts/Mahout/trunk/bin/mahout seq2sparse -i /home/ben/Scripts/eipi/mahout_out -o /home/ben/Scripts/eipi/termvecs
-wt tf -seq

Then I run this, trying to follow these instructions: https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html

~/Scripts/Mahout/trunk/bin/mahout lda -i /home/ben/Scripts/eipi/termvecs -o /home/ben/Scripts/eipi/lda_working
-k 2 -v 100 
And I get:

MAHOUT-JOB: /home/ben/Scripts/Mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
11/09/04 16:28:59 INFO common.AbstractJob: Command line arguments: 
{--endPhase=2147483647, --input=/home/ben/Scripts/eipi/termvecs, 
--maxIter=-1, --numTopics=2, --numWords=100, 
--output=/home/ben/Scripts/eipi/lda_working, --startPhase=0, 
--tempDir=temp, --topicSmoothing=-1.0} 11/09/04 16:29:00 INFO lda.LDADriver: LDA Iteration
1 11/09/04 16:29:01 INFO input.FileInputFormat: Total input paths to 
process : 4 11/09/04 16:29:01 INFO mapred.JobClient: Cleaning up the staging area 
file:/tmp/hadoop-ben/mapred/staging/ben692167368/.staging/job_local_0001 Exception in thread
"main" java.io.FileNotFoundException: File 
file:/home/ben/Scripts/eipi/termvecs/tokenized-documents/data does not 
exist. at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371) at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at 
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at
...


Does anyone know what I'm doing wrong?

Mime
View raw message