mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Command line : Error using clusterdump after cvb (0.7)
Date Wed, 14 Nov 2012 19:12:16 GMT
Clusterdump doesn't work on LDA output, as LDA doesn't produce "cluster"
objects.

If you want to look at the topics for CVB, use vectordump:


mahout vectordump -s <path to topics sequence file> --dictionary <path to
dictionary.file-0> --dictionaryType seqfile --vectorSize <num entries
per topic you
want to see> -sort



On Wed, Nov 14, 2012 at 10:22 AM, Jérémie Gomez <jeremie.gomez@gmail.com>wrote:

> Hi everyone,
>
> I have tried several of the clustering algorithms in mahout and they worked
> great, but I have a problem with the cvd implementation of Latent Dirichlet
> Allocation. The cvb command works fine but then using clusterdump gives me
> the following error :
>
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.mahout.math.VectorWritable cannot be cast to
> org.apache.mahout.clustering.iterator.ClusterWritable
>
> What I do in details :
> 1) mahout seqdirectory -c UTF-8 -i inputdir -o sequencefiles
> 2) mahout seq2sparse -i sequencefiles -o sparsevectors -ow -a
> org.apache.lucene.analysis.WhitespaceAnalyzer -x 99 -wt tfidf -s 5 -md 1 -x
> 90 -ng 2 -ml 50 -seq -n 2
> 3) mahout rowid -i sparsevectors/tf-vectors -o rowidresult
> 4) mahout mahout cvb -i rowresult/matrix -dict
> sparsevectors/dictionary.file-0 -o topics -dt documents -mt states -ow -k
> 10
> 5) mahout clusterdump -i topics -o clusters -of TEXT -n 10 -d
> marcelproust/dictionary.file-0 -dt sequencefile
>
> When I run command 5, I get the error above. Unfortunately, I could not
> find any working solution after searching the archives, so I though I'd ask
> the community !
>
> Thanks a lot in advance.
> Jeremie
>



-- 

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message