mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Understanding LogLikelihood Similarity
Date Wed, 30 Apr 2014 23:21:56 GMT
OK.

Whether a user has interacted with A is a sample from a binomial
distribution with an unknown parameter p_A.  Likewise with B and p_B.  The
two binomial distributions may or may not be independent.

The LLR is measuring the degree evidence against independence.




On Thu, May 1, 2014 at 12:50 AM, Mario Levitin <mariolevitin@gmail.com>wrote:

> Ted, I understand how the contingency table is constructed, and how to
> compute the LLR value. What I cannot understand is how to link this with
> binomial distributions.
>
>
> On Thu, May 1, 2014 at 1:02 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> > The contingency table is constructed by looking at how many users have
> > expressed preference or interest in two items.  If the items are A and B,
> > the pertinent counts are
> >
> > k11 - the number of users who interacted with both A and B
> > k12 - the number of users who interacted with A but not B
> > k21 - the number of users who interacted with B but not A
> > k22 - the number of users who interacted with neither A nor B.
> >
> > These values are values that go into the contingency table and are all
> that
> > is needed to compute the LLR value.
> >
> > See http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.htmlfor
> > a
> > detailed description.
> >
> >
> >
> >
> > On Wed, Apr 30, 2014 at 11:31 PM, Mario Levitin <mariolevitin@gmail.com
> > >wrote:
> >
> > > Hi Ted,
> > > I have read the paper. I understand the "Likelihood Ratio for Binomial
> > > Distributions" part.
> > > However, I cannot make a connection with this part and the contingency
> > > table.
> > >
> > > In order to calculate Likelihood Ratio for two Binomial Distributions
> you
> > > need the values: p, p1, p2, k1, k2, n1, n2.
> > > But the information contained in the contingency table are different
> from
> > > these values. So, again, I do not understand how the information
> > contained
> > > in the contingency table is linked with Likelihood Ratio for Binomial
> > > Distributions.
> > >
> > > In order to find the similarity between two users I tend to think of
> the
> > > boolean preferences of user1 as a sample from a binomial distribution
> and
> > > the boolean preferences of user2 as another sample from a binomial
> > > distribution. Then use the LLR to assess how likely these distributions
> > are
> > > the same. But I don't think this is correct since this calculation does
> > not
> > > use the contingency table.
> > >
> > > I hope my question is clear.
> > > Thanks.
> > >
> > >
> > >
> > > On Mon, Apr 28, 2014 at 2:41 AM, Ted Dunning <ted.dunning@gmail.com>
> > > wrote:
> > >
> > > > Excellent.  Look forward to hearing your reactions.
> > > >
> > > > On Mon, Apr 28, 2014 at 1:14 AM, Mario Levitin <
> mariolevitin@gmail.com
> > > > >wrote:
> > > >
> > > > > Not yet, but I will.
> > > > >
> > > > > >
> > > > > > Have you read my original paper on the topic of LLR?  It explains
> > the
> > > > > > connection with chi^2 measures of association.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message