mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gruszowska Natalia <Natalia.Gruszow...@grupaonet.pl>
Subject RE: Collaborative filtering item-based in mahout - without isolating users
Date Thu, 11 Dec 2014 12:23:08 GMT
To be honest I haven't seen the code of this similarity (do you have?). But then as I see it,
it ignore other side - this time popular items and additional it looks like it ignore value
of ratig - has only 1 or 0.

N.

-----Original Message-----
From: mario.alemi@gmail.com [mailto:mario.alemi@gmail.com] 
Sent: Thursday, December 11, 2014 12:00 PM
To: user@mahout.apache.org
Subject: Re: Collaborative filtering item-based in mahout - without isolating users

> otherwise we recommend only very popular items

this is why you have loglikelihood ratio, right?
m

On Thu, Dec 11, 2014 at 11:51 AM, Gruszowska Natalia < Natalia.Gruszowska@grupaonet.pl>
wrote:

> Mario,
> I think in terms of correctness. In similarities like Euclidean, 
> Pearson correlation or Cosine Similarity better results are if we 
> consider only common users (users who rated both compared items). This 
> assumption let to find similar item for those which are unpopular, 
> otherwise we recommend only very popular items. For my data it is unacceptable.
>
> "But if you take, for example, the cosine similarity, you shouldn't 
> throw away the data." - you should, it result in dimension reduction 
> and it is good. Everything is still in the same space but for each 
> pair the space is reduced.
>
> My question is why someone who wrote this code ignored this so 
> important assumption? It was by accident or due to some important 
> reasons like effectiveness or computational complexity?
>
>
> Natalia
>
>
> -----Original Message-----
> From: mario.alemi@gmail.com [mailto:mario.alemi@gmail.com]
> Sent: Wednesday, December 10, 2014 7:05 PM
> To: user@mahout.apache.org
> Subject: Re: Collaborative filtering item-based in mahout - without 
> isolating users
>
> Hi Natalia
>
> Regarding example 1, if you think in terms of likelihood that the two 
> products have been bought together because they are similar (opposed 
> to by chance), the similarity is undefined. As everyone buys 12, of 
> course the person who bought 11 bough also 12, right?
>
> This if you compute the similarity through a co-occurence matrix (and 
> loglikelihood ratio)
>
> But you say "In the theory, similarity between two items should be 
> calculated only for users who ranked both items".
>
> I guess you mean: "Users [1,2,4] don't know about item 11, therefore 
> they do not collaborate in building the similarity between the two 
> items. User [3], on the contrary, does, and gives the same rating to 
> the two products, therefore the similarity is 1".
>
> But if you take, for example, the cosine similarity, you shouldn't 
> throw away the data. Here, you build a space with four dimensions -the 
> ratings of four users. You can't say product 11 is on another space 
> when it relates with user 1,2,4 because hasn't been rated by those 
> users. They all are there. They are dimensions, like in physics. 
> Therefore you must use this information too. Items are in the user-space... all.
>
> Even intuitively, items 11 and 12 are not similar at all -one has been 
> bought by every customer, the other by just one customer. How could 
> you tell the next customer who buys 12 (everyone does...) that she 
> would really like 11...?
>
> Mario
>
>
> On Wed, Dec 10, 2014 at 4:40 PM, Gruszowska Natalia < 
> Natalia.Gruszowska@grupaonet.pl> wrote:
>
> > Hi All,
> >
> > In mahout there is implemented method for item based Collaborative 
> > filtering called itemsimilarity, which returns the "similarity"
> > between each two items.
> > In the theory, similarity between two items should be calculated 
> > only for users who ranked both items. During testing I realized that 
> > in mahout it works different.
> > Below two examples.
> >
> > Example 1. items are 11-12
> > In below example the similarity between item 11 and 12 should be 
> > equal 1, but mahout output is 0.36. It looks like mahout treats null as 0.
> > Similarity between items:
> > 101     102     0.36602540378443865
> >
> > Matrix with preferences:
> >             11       12
> > 1                     1
> > 2                     1
> > 3           1         1
> > 4                     1
> >
> > Example 2. items are 101-103.
> > Similarity between items 101 and 102 should be calculated using only 
> > ranks for users 4 and 5, and the same for items 101 and 103 (that 
> > should be based on theory). Here (101,103) is more similar than 
> > (101,102), and it shouldn't be.
> > Similarity between items:
> > 101     102     0.2612038749637414
> > 101     103     0.4340578302732228
> > 102     103     0.2600070276638468
> >
> > Matrix with preferences:
> >             101      102        103
> > 1                     1         0.1
> > 2                     1         0.1
> > 3                     1         0.1
> > 4           1         1         0.1
> > 5           1         1         0.1
> > 6                     1         0.1
> > 7                     1         0.1
> > 8                     1         0.1
> > 9                     1         0.1
> > 10                    1         0.1
> >
> >
> > Both examples were run without any additional parameters.
> > Is this problem solved somewhere, somehow? Any ideas? Why null is 
> > treated as 0?
> > Source: http://files.grouplens.org/papers/www10_sarwar.pdf
> >
> >
> >
> > Kind regards,
> > Natalia Gruszowska
> >
> >
> >
>

Mime
View raw message