mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norlan Eph <>
Subject Issue about MAHOUT CVB output
Date Mon, 28 Oct 2013 14:07:13 GMT
Dear friends,
       I was trying do some text data mining about topic-model with mahout.
So I have tryed the lda example of and get the output.
But I get some trouble understanding the data of this output text just as

      In my opinion, this should be the doc-term distribution namely every
doc's tendency probability to the topic-word, and the digit before the
colon( just like the 0.06,0.10,0.007050 in the doc 2) should be index of
the origin word in the dictionary which was built when we invoking
seq2sparse. Is this right? If so, how could I translate the index into the
origin word which makes the output easier to understand and further use. If
not so, can you explain these output data for me?Much thanks!
     By the way, any advise relevant are appreciated!

>From  Norlan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message