mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Hyllested <corey.hylles...@gmail.com>
Subject Re: Error running mahout cvb
Date Tue, 09 Jul 2013 06:21:13 GMT
Agreed.


After seq2sparse, you need to create a matrix.

http://stackoverflow.com/questions/14757162/run-cvb-in-mahout-0-8

so, something like this.

mahout rowid -i $work_dir/input_seqparse/tf-vectors -o $work_dir/matrix

mahout cvb -i $work_dir/matrix/ -o $work_dir/lda_output -mt
$work_dir/lda_output/models -dt $work_dir/lda_output/docTopics -k 3
-nt -maxIter 200


Unsolicited advice.

There is no reason to trash your sequence files (rm -rf
$work_dir/input_seqfiles) each time.

Provide a model location, this allows the computation to pick up where
it left off if something were to go awry.


- Corey


On Mon, Jul 8, 2013 at 10:43 PM, Gmail <giggs102@gmail.com> wrote:

> Hi
>
> I am trying to run the mahout cvb on hadoop cluster using some text files
> as input . I am getting the following error :
>
> Exception in thread "main" java.lang.IllegalStateException: No part files
> found in model path 'temp/topicModelState/model-1'
>
> My script for running mahout cvb looks like this :
>
> export work_dir=/home/mahout
>
> rm -rf $work_dir/input_seqfiles
>
> ./mahout seqdirectory --input $work_dir/lda_input --output
> $work_dir/input_seqfiles -c UTF8
>
> rm -rf $work_dir/input_seqparse
>
> ./mahout seq2sparse -i $work_dir/input_seqfiles -o
> $work_dir/input_seqparse -wt tf
>
> ./mahout cvb -i $work_dir/input_seqparse -o $work_dir/lda_output -k 3 -nt
> 10 --maxIter 200
>
>
> Is there something i am missing ? Any help or suggestion is greatly
> appreciated .
>
> Thanks
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message