mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <smar...@apache.org>
Subject Re: Difficulties mapping results of CVB/LDA back to corresponding vector keys
Date Thu, 24 Apr 2014 23:52:49 GMT
RowId creates a matrix and docIndex which r <IntWritable, vectorWritable>
and <IntWritable, Text> respectively.

Have u looked at LDAPrintTopics.java ?


On Thu, Apr 24, 2014 at 7:32 PM, Mohammed Omer <beancinematics@gmail.com>wrote:

> Good evening all.
>
> This is my first time working with Mahout, and I'm really excited about
> being able to stand on the shoulders of giants, thanks to your hard work on
> the project.
>
> I'm 90% of the way there with my current Mahout project, but that last 10%
> is killing me.
>
> Code is at https://github.com/momer/mahout_difficulties if you want to
> skip
> my explanation and go right to the commands I ran, etc.
>
> Using a Lucene index and Mahout's robust CLI, I was able to generate
> sequence files; sparse vectors; convert those vector keys to integers; and
> as a result, run the CVB/LDA Algorithm.
>
> This worked great, and I was able to dump out the p(doc|topic) and
> p(topic|term) results; but, I'm having a tough time figuring out how to use
> the matrix generated by `mahout rowid` to map the documents and their
> respective topic-assignments/probabilities back to their original text
> vector keys.
>
> Though I'm typically a Rubyist, and having recently (last weekend)
> read/worked through the entirety of Core Java vol 1, I'm pretty comfortable
> with Java. I am falling on my face at this last step, though.
>
> I appreciate the eyes and help!
>
> Thank you again,
>
> Mo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message