mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Nithian <anith...@gmail.com>
Subject Re: Question about Pearson Correlation in non-Taste mode
Date Wed, 27 Nov 2013 14:02:42 GMT
Thanks Sebastian! Is there a particular reason for that?
On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ssc.open@googlemail.com>
wrote:

> Hi Amit,
>
> You are right, the non-corated items are not filtered out in the
> distributed implementation.
>
> --sebastian
>
>
> On 26.11.2013 20:51, Amit Nithian wrote:
> > Hi all,
> >
> > Apologies if this is a repeat question as I just joined the list but I
> have
> > a question about the way that metrics like Cosine and Pearson are
> > calculated in Hadoop "mode" (i.e. non Taste).
> >
> > As far as I understand, the vectors used for computing pairwise item
> > similarity in Taste are based on the co-rated items; however, in the
> Hadoop
> > implementation, I don't see this done.
> >
> > The implementation of the distributed item-item similarity comes from
> this
> > paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I
> didn't
> > see anything in this paper about filtering out those elements from the
> > vectors not co-rated and this can make a difference especially when you
> > normalize the ratings by dividing by the average item rating. In some
> > cases, the # users to divide by can be fewer depending on the sparseness
> of
> > the vector.
> >
> > Any clarity on this would be helpful.
> >
> > Thanks!
> > Amit
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message