mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Conwell <j...@iamjohn.me>
Subject Re: Simple Result Interpretation Question
Date Thu, 06 Sep 2012 16:11:13 GMT
I'm curious, with 8 million users and only 32 products, your data might not
be sparse enough (never thought that would be a problem).  You might have
enough users that purchased a high enough percentage of your products that
you end up with a every item to every items recommendation.



On Thu, Sep 6, 2012 at 8:54 AM, Thomas, Sebastien <
Sebastien.Thomas@disney.com> wrote:

> Thanks for your reply! But all the others give me pretty similar results.
>
> Pearson: -0.14<similariry<0.12
> Uncentered_cosine: 0.79<similarity<0.85
> Tanimoto: 0.001<similarity<0.2
> Loglikelyhood: 0.8<similarity<0.99
>
> Thanks
>
> -----Original Message-----
> From: Sean Owen [mailto:srowen@gmail.com]
> Sent: Thursday, September 06, 2012 11:27 AM
> To: user@mahout.apache.org
> Subject: Re: Simple Result Interpretation Question
>
> This sounds like rounding error. If I recall correctly the Euclidean
> distance is converted to similarity with a function like 1/(1+d). I suppose
> the embedded assumption is that distances are "not extremely small". If
> your vector space has small values and distances are commonly 0.000001 or
> something, the results would always be near 1.
>
> You can make up another translation to [0,1], or scale your values if
> that's the cause. Or try another metric; basing on the Euclidean distance
> has always been a bit artificial.
>
> On Thu, Sep 6, 2012 at 4:13 PM, Thomas, Sebastien <
> Sebastien.Thomas@disney.com> wrote:
> > Hi community,
> >
> > I am new to mahout and I am looking for some hint. I am running the
> "itemsimilarity", I have about 8 million users and 32 items. My output file
> (with the format: <item1, item2, similarity>) is basically telling me that
> all my items are similar (if my interpretation is right). For example, all
> the similarities are 1s when I run the EUCLIDEAN_DISTANCE similarity class.
> >
> > I would appreciate any help to understand and know what to do.
> >
> > Thank you
> >
> > Sebastien
>



-- 

Thanks,
John C

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message