mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Omer <>
Subject Re: Difficulties mapping results of CVB/LDA back to corresponding vector keys
Date Fri, 25 Apr 2014 13:39:35 GMT

I ended up using seqdumper on the docIndex file to retrieve the mapping of
rowid -> text key. This brought me a lot closer than where I was before!

However, now I have three files (contents here:

My first thought is to write my own map/reduce jobs to get a dataset
(key/value) which has the format: original_text_key => [term1, term2,
term3, term4]

Where the terms are selected from the topic which has the highest
probability of describing the document.

An example:

=> ["end","token","authentication","api","resource","def","devise","json","user","x"]

Is there built-in functionality to do this, or is my plan of running
another 1 or 2 map-reduce jobs the way to go?

Thank you for your help again,


On Thu, Apr 24, 2014 at 6:52 PM, Suneel Marthi <> wrote:

> RowId creates a matrix and docIndex which r <IntWritable, vectorWritable>
> and <IntWritable, Text> respectively.
> Have u looked at ?
> On Thu, Apr 24, 2014 at 7:32 PM, Mohammed Omer <>wrote:
>> Good evening all.
>> This is my first time working with Mahout, and I'm really excited about
>> being able to stand on the shoulders of giants, thanks to your hard work
>> on
>> the project.
>> I'm 90% of the way there with my current Mahout project, but that last 10%
>> is killing me.
>> Code is at if you want to
>> skip
>> my explanation and go right to the commands I ran, etc.
>> Using a Lucene index and Mahout's robust CLI, I was able to generate
>> sequence files; sparse vectors; convert those vector keys to integers; and
>> as a result, run the CVB/LDA Algorithm.
>> This worked great, and I was able to dump out the p(doc|topic) and
>> p(topic|term) results; but, I'm having a tough time figuring out how to
>> use
>> the matrix generated by `mahout rowid` to map the documents and their
>> respective topic-assignments/probabilities back to their original text
>> vector keys.
>> Though I'm typically a Rubyist, and having recently (last weekend)
>> read/worked through the entirety of Core Java vol 1, I'm pretty
>> comfortable
>> with Java. I am falling on my face at this last step, though.
>> I appreciate the eyes and help!
>> Thank you again,
>> Mo

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message