mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Nithian <anith...@gmail.com>
Subject Re: Question about Pearson Correlation in non-Taste mode
Date Wed, 27 Nov 2013 15:09:19 GMT
Comparing this against the non distributed (taste) gives different answers
for item item similarity as of course the non distributed looks only at
corated items. I was more wondering if this difference in practice mattered
or not.

Also I'm confused on how you can compute the Pearson similarity between two
vectors of different length which essentially is going on here I think?

Thanks again
Amit
On Nov 27, 2013 9:06 AM, "Sebastian Schelter" <ssc.open@googlemail.com>
wrote:

> Yes, it is due to the parallel algorithm which only looks at co-ratings
> from a given user.
>
>
> On 27.11.2013 15:02, Amit Nithian wrote:
> > Thanks Sebastian! Is there a particular reason for that?
> > On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ssc.open@googlemail.com>
> > wrote:
> >
> >> Hi Amit,
> >>
> >> You are right, the non-corated items are not filtered out in the
> >> distributed implementation.
> >>
> >> --sebastian
> >>
> >>
> >> On 26.11.2013 20:51, Amit Nithian wrote:
> >>> Hi all,
> >>>
> >>> Apologies if this is a repeat question as I just joined the list but I
> >> have
> >>> a question about the way that metrics like Cosine and Pearson are
> >>> calculated in Hadoop "mode" (i.e. non Taste).
> >>>
> >>> As far as I understand, the vectors used for computing pairwise item
> >>> similarity in Taste are based on the co-rated items; however, in the
> >> Hadoop
> >>> implementation, I don't see this done.
> >>>
> >>> The implementation of the distributed item-item similarity comes from
> >> this
> >>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I
> >> didn't
> >>> see anything in this paper about filtering out those elements from the
> >>> vectors not co-rated and this can make a difference especially when you
> >>> normalize the ratings by dividing by the average item rating. In some
> >>> cases, the # users to divide by can be fewer depending on the
> sparseness
> >> of
> >>> the vector.
> >>>
> >>> Any clarity on this would be helpful.
> >>>
> >>> Thanks!
> >>> Amit
> >>>
> >>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message