mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Simple Result Interpretation Question
Date Thu, 06 Sep 2012 15:27:03 GMT
This sounds like rounding error. If I recall correctly the Euclidean
distance is converted to similarity with a function like 1/(1+d). I
suppose the embedded assumption is that distances are "not extremely
small". If your vector space has small values and distances are
commonly 0.000001 or something, the results would always be near 1.

You can make up another translation to [0,1], or scale your values if
that's the cause. Or try another metric; basing on the Euclidean
distance has always been a bit artificial.

On Thu, Sep 6, 2012 at 4:13 PM, Thomas, Sebastien
<Sebastien.Thomas@disney.com> wrote:
> Hi community,
>
> I am new to mahout and I am looking for some hint. I am running the "itemsimilarity",
I have about 8 million users and 32 items. My output file (with the format: <item1, item2,
similarity>) is basically telling me that all my items are similar (if my interpretation
is right). For example, all the similarities are 1s when I run the EUCLIDEAN_DISTANCE similarity
class.
>
> I would appreciate any help to understand and know what to do.
>
> Thank you
>
> Sebastien

Mime
View raw message