mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Omer <beancinemat...@gmail.com>
Subject Difficulties mapping results of CVB/LDA back to corresponding vector keys
Date Thu, 24 Apr 2014 23:32:57 GMT
Good evening all.

This is my first time working with Mahout, and I'm really excited about
being able to stand on the shoulders of giants, thanks to your hard work on
the project.

I'm 90% of the way there with my current Mahout project, but that last 10%
is killing me.

Code is at https://github.com/momer/mahout_difficulties if you want to skip
my explanation and go right to the commands I ran, etc.

Using a Lucene index and Mahout's robust CLI, I was able to generate
sequence files; sparse vectors; convert those vector keys to integers; and
as a result, run the CVB/LDA Algorithm.

This worked great, and I was able to dump out the p(doc|topic) and
p(topic|term) results; but, I'm having a tough time figuring out how to use
the matrix generated by `mahout rowid` to map the documents and their
respective topic-assignments/probabilities back to their original text
vector keys.

Though I'm typically a Rubyist, and having recently (last weekend)
read/worked through the entirety of Core Java vol 1, I'm pretty comfortable
with Java. I am falling on my face at this last step, though.

I appreciate the eyes and help!

Thank you again,

Mo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message