mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mario.al...@gmail.com
Subject Re: Collaborative filtering item-based in mahout - without isolating users
Date Thu, 11 Dec 2014 11:00:02 GMT
> otherwise we recommend only very popular items

this is why you have loglikelihood ratio, right?
m

On Thu, Dec 11, 2014 at 11:51 AM, Gruszowska Natalia <
Natalia.Gruszowska@grupaonet.pl> wrote:

> Mario,
> I think in terms of correctness. In similarities like Euclidean, Pearson
> correlation or Cosine Similarity better results are if we consider only
> common users (users who rated both compared items). This assumption let to
> find similar item for those which are unpopular, otherwise we recommend
> only very popular items. For my data it is unacceptable.
>
> "But if you take, for example, the cosine similarity, you shouldn't throw
> away the data." - you should, it result in dimension reduction and it is
> good. Everything is still in the same space but for each pair the space is
> reduced.
>
> My question is why someone who wrote this code ignored this so important
> assumption? It was by accident or due to some important reasons like
> effectiveness or computational complexity?
>
>
> Natalia
>
>
> -----Original Message-----
> From: mario.alemi@gmail.com [mailto:mario.alemi@gmail.com]
> Sent: Wednesday, December 10, 2014 7:05 PM
> To: user@mahout.apache.org
> Subject: Re: Collaborative filtering item-based in mahout - without
> isolating users
>
> Hi Natalia
>
> Regarding example 1, if you think in terms of likelihood that the two
> products have been bought together because they are similar (opposed to by
> chance), the similarity is undefined. As everyone buys 12, of course the
> person who bought 11 bough also 12, right?
>
> This if you compute the similarity through a co-occurence matrix (and
> loglikelihood ratio)
>
> But you say "In the theory, similarity between two items should be
> calculated only for users who ranked both items".
>
> I guess you mean: "Users [1,2,4] don't know about item 11, therefore they
> do not collaborate in building the similarity between the two items. User
> [3], on the contrary, does, and gives the same rating to the two products,
> therefore the similarity is 1".
>
> But if you take, for example, the cosine similarity, you shouldn't throw
> away the data. Here, you build a space with four dimensions -the ratings of
> four users. You can't say product 11 is on another space when it relates
> with user 1,2,4 because hasn't been rated by those users. They all are
> there. They are dimensions, like in physics. Therefore you must use this
> information too. Items are in the user-space... all.
>
> Even intuitively, items 11 and 12 are not similar at all -one has been
> bought by every customer, the other by just one customer. How could you
> tell the next customer who buys 12 (everyone does...) that she would really
> like 11...?
>
> Mario
>
>
> On Wed, Dec 10, 2014 at 4:40 PM, Gruszowska Natalia <
> Natalia.Gruszowska@grupaonet.pl> wrote:
>
> > Hi All,
> >
> > In mahout there is implemented method for item based Collaborative
> > filtering called itemsimilarity, which returns the "similarity"
> > between each two items.
> > In the theory, similarity between two items should be calculated only
> > for users who ranked both items. During testing I realized that in
> > mahout it works different.
> > Below two examples.
> >
> > Example 1. items are 11-12
> > In below example the similarity between item 11 and 12 should be equal
> > 1, but mahout output is 0.36. It looks like mahout treats null as 0.
> > Similarity between items:
> > 101     102     0.36602540378443865
> >
> > Matrix with preferences:
> >             11       12
> > 1                     1
> > 2                     1
> > 3           1         1
> > 4                     1
> >
> > Example 2. items are 101-103.
> > Similarity between items 101 and 102 should be calculated using only
> > ranks for users 4 and 5, and the same for items 101 and 103 (that
> > should be based on theory). Here (101,103) is more similar than
> > (101,102), and it shouldn't be.
> > Similarity between items:
> > 101     102     0.2612038749637414
> > 101     103     0.4340578302732228
> > 102     103     0.2600070276638468
> >
> > Matrix with preferences:
> >             101      102        103
> > 1                     1         0.1
> > 2                     1         0.1
> > 3                     1         0.1
> > 4           1         1         0.1
> > 5           1         1         0.1
> > 6                     1         0.1
> > 7                     1         0.1
> > 8                     1         0.1
> > 9                     1         0.1
> > 10                    1         0.1
> >
> >
> > Both examples were run without any additional parameters.
> > Is this problem solved somewhere, somehow? Any ideas? Why null is
> > treated as 0?
> > Source: http://files.grouplens.org/papers/www10_sarwar.pdf
> >
> >
> >
> > Kind regards,
> > Natalia Gruszowska
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message