mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hall <d...@cs.berkeley.edu>
Subject Re: Probability from log likelihood in LDA output
Date Tue, 07 Dec 2010 04:46:04 GMT
Hi,

The scores aren't (log) normalized until they're loaded in the map
phase. Take a look at LDAState. The array

private final double[] logTotals; // log \sum p(w|t) for topic=1..nTopics

in LDAState has normalization constants.  The method
logProbWordGivenTopic is intended for access...  LDADriver#createState
is a round about way of creating an LDA State.

-- David

On Mon, Dec 6, 2010 at 12:06 PM, Quiroz Hernandez, Andres
<Andres.QuirozHernandez@xerox.com> wrote:
> Thanks for your quick reply, Ted. It looks like either the probabilities are not normalized
or the function being used is not a simple sum of log probabilities, because exp does not
always return a value between 0 and 1. I will take a look at the code to see if I can find
exactly how the value is calculated (but if anyone knows the function used, and if I can directly
invert it to find P(w|t) please let me know).
>
> Thanks again,
>
> Andres
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Monday, December 06, 2010 11:57 AM
> To: user@mahout.apache.org
> Subject: Re: Probability from log likelihood in LDA output
>
> Yes.  I should be possible to use exp to get the actual probability.  The
> fact that it is a sum
> of log probabilities just means that the probability is a product of
> probabilities.
>
> It is possible that the probabilities are not normalized, but that would be
> a bit surprising for
> this kind of algorithm.
>
> On Mon, Dec 6, 2010 at 8:02 AM, Quiroz Hernandez, Andres <
> Andres.QuirozHernandez@xerox.com> wrote:
>
>> Hello,
>>
>> As I understand it, the output for LDA is a log likelihood value for
>> each word/topic pair, which is a function of log(P(w|t)). Is it possible
>> to invert that function to obtain P(w|t)? I have a feeling it is not,
>> since it looks like the final value is obtained as a sum of log
>> probabilities, but I just wanted to check, since an output as a
>> probability is more readable than the likelihood value given.
>>
>> Thanks,
>>
>> Andres
>>
>

Mime
View raw message