mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Kapoor <mridulkap...@gmail.com>
Subject Re: Multidimensional log-likelihood similarity
Date Tue, 01 Oct 2013 08:15:18 GMT
Thanks Ted, awesome(and intuitive) how you reduced my problem by comparing
features to users!

Mridul


On 30 September 2013 10:47, Ted Dunning <ted.dunning@gmail.com> wrote:

> Yes.  You can turn the normal item-item relationships around to get this.
>
> What you have is an item x feature matrix.  Normally, one has a user x item
> matrix in cooccurrence analysis and you get an item x item matrix.
>
> If you consider the features to be "users" in the computation, then the
> resulting indicator matrix would be just what you want.
>
> The basic idea is that items would be related if they share features.  Two
> items that have the same feature would be said to co-occur on that feature.
>  Finding anomalous cooccurrence would be what you need to do to find items
> that co-occur on many features.
>
> This works by building a small 2x2 matrix that relates item A and item B.
>  The entries would be feature counts.  The upper left entry of the matrix
> is the number of features that A and B both have, the upper right is the
> number of features that B has that A does not and so on. Put another way,
> the columns represent features that A has or does not have respectively and
> the rows represent the features that B has or does not have respectively.
>  Items that give high root log-likelihood ratio values should considered
> connected.  Those that have small values for root LLR should be considered
> not connected.  The value of the root-LLR should be discarded after
> thresholding and should not be considered a measure of the strength of the
> relationship.
>
> I would recommend the same down-sampling that the rowSimilarityJob already
> does.
>
>
>
>
>
> On Sun, Sep 29, 2013 at 3:40 AM, Mridul Kapoor <mridulkapoor@gmail.com
> >wrote:
>
> > Hi
> >
> > I have records - items - with many features.
> > Something like
> >
> > ID, feature1, feature2, ..., featureN
> > >
> >
> > Can I leverage Mahout's log-likelihood similarity metrics for calculating
> > the K-Most similar items to a given item X?
> >
> > -
> > Thanks
> > Mridul
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message