mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Why does SIMILARITY_EUCLIDEAN_DISTANCE only generates outputs with a similarity score of "1" for binary input?
Date Sun, 08 May 2011 17:57:53 GMT
All preferences are "1" in your world. Therefore user vectors are
always like (1,1,...,1). The distance between any two is 0, and the
similarity is 1. This metric is not appropriate for binary data. The
closest thing to what I think you want is the
TanimotoCoefficientsimilarity, but also try LogLikelihoodSimilarity.

Yes, if you have a range of ratings, not just 1, it becomes meaningful
again to look at distance as a similarity metric.

Sean

On Sun, May 8, 2011 at 5:37 PM, Thomas Söhngen <thomas@beluto.com> wrote:
> Hello everyone,
>
> I am calculating similiar items with the SIMILARITY_EUCLIDEAN_DISTANCE
> class. My input is binary data, users clicking a like button. The output
> only generates similarities with a similarity score of "1". It doesn't
> calculate all items similiar to each other, but for the items it finds a
> similarity, the output is always "1". Why is this?
>
> I don't have the problem, when I also add a "dislike" information, with
> input lines "item_id,user_id,1" for a Like interaction and
> "item_id,user_id,-1" for dislikes. The similarity lies between 0 and 1 then.
>
> Regards and thanks for suggestions,
> Thomas
>

Mime
View raw message