mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas, Sebastien" <Sebastien.Tho...@disney.com>
Subject RE: Simple Result Interpretation Question
Date Thu, 06 Sep 2012 15:54:18 GMT
Thanks for your reply! But all the others give me pretty similar results.

Pearson: -0.14<similariry<0.12
Uncentered_cosine: 0.79<similarity<0.85
Tanimoto: 0.001<similarity<0.2
Loglikelyhood: 0.8<similarity<0.99
 
Thanks

-----Original Message-----
From: Sean Owen [mailto:srowen@gmail.com] 
Sent: Thursday, September 06, 2012 11:27 AM
To: user@mahout.apache.org
Subject: Re: Simple Result Interpretation Question

This sounds like rounding error. If I recall correctly the Euclidean distance is converted
to similarity with a function like 1/(1+d). I suppose the embedded assumption is that distances
are "not extremely small". If your vector space has small values and distances are commonly
0.000001 or something, the results would always be near 1.

You can make up another translation to [0,1], or scale your values if that's the cause. Or
try another metric; basing on the Euclidean distance has always been a bit artificial.

On Thu, Sep 6, 2012 at 4:13 PM, Thomas, Sebastien <Sebastien.Thomas@disney.com> wrote:
> Hi community,
>
> I am new to mahout and I am looking for some hint. I am running the "itemsimilarity",
I have about 8 million users and 32 items. My output file (with the format: <item1, item2,
similarity>) is basically telling me that all my items are similar (if my interpretation
is right). For example, all the similarities are 1s when I run the EUCLIDEAN_DISTANCE similarity
class.
>
> I would appreciate any help to understand and know what to do.
>
> Thank you
>
> Sebastien
Mime
View raw message