mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avishay Livne1 <AVISH...@il.ibm.com>
Subject Re: extract p(doc|topic) from LDA
Date Mon, 07 Jun 2010 12:06:41 GMT
I modified
$MAHOUT_HOME/utils/src/main/java/org/apache/mahout/clustering/lda/LDAPrintTopics.java
 so the score is printed along each word., but the interpretation of the
scores is somewhat obscure.
I see values in the range of -8 to +6. I assumed the values should
represent P(word | topic) or  log(P(word | topic)) but these values are of
different range.
How should I interpret these values? Is there a simple way to retrieve P
(word | topic)?

Thanks,
Avishay.


                                                                                         
                                          
  From:       Avishay Livne1/Haifa/IBM@IBMIL                                             
                                          
                                                                                         
                                          
  To:         user@mahout.apache.org                                                     
                                          
                                                                                         
                                          
  Date:       06/06/2010 03:16 PM                                                        
                                          
                                                                                         
                                          
  Subject:    extract p(doc|topic) from LDA                                              
                                          
                                                                                         
                                          






Hi,

I'm trying to use LDA for a collaborative filtering task, where I need to
predict the rating a user (document) will give to a movie (word).
I ran LDA and constructed T topics, but I can only print the most frequent
words (movies) per topic.
Is it possible to extract p(documet|topic) or p(word|topic) from LDA's
output? (document = new user, word = movie).

Best regards,
Avishay





Mime
View raw message