mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <ssc.o...@googlemail.com>
Subject Re: Question about Pearson Correlation in non-Taste mode
Date Wed, 27 Nov 2013 12:46:19 GMT
Hi Amit,

You are right, the non-corated items are not filtered out in the
distributed implementation.

--sebastian


On 26.11.2013 20:51, Amit Nithian wrote:
> Hi all,
> 
> Apologies if this is a repeat question as I just joined the list but I have
> a question about the way that metrics like Cosine and Pearson are
> calculated in Hadoop "mode" (i.e. non Taste).
> 
> As far as I understand, the vectors used for computing pairwise item
> similarity in Taste are based on the co-rated items; however, in the Hadoop
> implementation, I don't see this done.
> 
> The implementation of the distributed item-item similarity comes from this
> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I didn't
> see anything in this paper about filtering out those elements from the
> vectors not co-rated and this can make a difference especially when you
> normalize the ratings by dividing by the average item rating. In some
> cases, the # users to divide by can be fewer depending on the sparseness of
> the vector.
> 
> Any clarity on this would be helpful.
> 
> Thanks!
> Amit
> 


Mime
View raw message