mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <ssc.o...@googlemail.com>
Subject Re: Question about Pearson Correlation in non-Taste mode
Date Wed, 27 Nov 2013 14:05:28 GMT
Yes, it is due to the parallel algorithm which only looks at co-ratings
from a given user.


On 27.11.2013 15:02, Amit Nithian wrote:
> Thanks Sebastian! Is there a particular reason for that?
> On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ssc.open@googlemail.com>
> wrote:
> 
>> Hi Amit,
>>
>> You are right, the non-corated items are not filtered out in the
>> distributed implementation.
>>
>> --sebastian
>>
>>
>> On 26.11.2013 20:51, Amit Nithian wrote:
>>> Hi all,
>>>
>>> Apologies if this is a repeat question as I just joined the list but I
>> have
>>> a question about the way that metrics like Cosine and Pearson are
>>> calculated in Hadoop "mode" (i.e. non Taste).
>>>
>>> As far as I understand, the vectors used for computing pairwise item
>>> similarity in Taste are based on the co-rated items; however, in the
>> Hadoop
>>> implementation, I don't see this done.
>>>
>>> The implementation of the distributed item-item similarity comes from
>> this
>>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I
>> didn't
>>> see anything in this paper about filtering out those elements from the
>>> vectors not co-rated and this can make a difference especially when you
>>> normalize the ratings by dividing by the average item rating. In some
>>> cases, the # users to divide by can be fewer depending on the sparseness
>> of
>>> the vector.
>>>
>>> Any clarity on this would be helpful.
>>>
>>> Thanks!
>>> Amit
>>>
>>
>>
> 


Mime
View raw message