# mahout-user mailing list archives

##### Site index · List index
Message view
Top
From Dan Filimon <dangeorge.fili...@gmail.com>
Subject Re: Log-likelihood ratio test as a probability
Date Thu, 20 Jun 2013 08:52:55 GMT
```My understanding:

Yes, the log-likelihood ratio (-2 log lambda) follows a chi-squared
distribution with 1 degree of freedom in the 2x2 table case.
A   ~A
B
~B

We're testing to see if p(A | B) = p(A | ~B). That's the null hypothesis. I
compute the LLR. The larger that is, the more unlikely the null hypothesis
is to be true.
I can then look at a table with df=1. And I'd get p, the probability of
seeing that result or something worse (the upper tail).
So, the probability of them being similar is 1 - p (which is exactly the
CDF for that value of X).

Now, my question is: in the contingency table case, why would I normalize?
It's a ratio already, isn't it?

On Thu, Jun 20, 2013 at 11:03 AM, Sean Owen <srowen@gmail.com> wrote:

> someone can check my facts here, but the log-likelihood ratio follows
> a chi-square distribution. You can figure an actual probability from
> that in the usual way, from its CDF. You would need to tweak the code
> you see in the project to compute an actual LLR by normalizing the
> input.
>
> You could use 1-p then as a similarity metric.
>
> This also isn't how the test statistic is turned into a similarity
> metric in the project now. But 1-p sounds nicer. Maybe the historical
> reason was speed, or, ignorance.
>
> On Thu, Jun 20, 2013 at 8:53 AM, Dan Filimon
> <dangeorge.filimon@gmail.com> wrote:
> > When computing item-item similarity using the log-likelihood similarity
> > [1], can I simply apply a sigmoid do the resulting values to get the
> > probability that two items are similar?
> >
> > Is there any other processing I need to do?
> >
> > Thanks!
> >
> > [1] http://tdunning.blogspot.ro/2008/03/surprise-and-coincidence.html
>

```
Mime
• Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message