mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Custom Item Similarity :datamodel not sure
Date Fri, 28 Sep 2012 16:43:46 GMT
I don't know if you can speed this up very directly. You can try a
different similarity metric. But if you really want to compute every
item-item pair, it's necessarily going to scale as the square of the
number of items, and that will be slow. Consider whether you need to
precompute every pair.

On Fri, Sep 28, 2012 at 5:07 PM, Abhishek Roy <> wrote:
>> >Thanks for your inputs Sean. I implemented the top N(most similar items)
> looking at and reusing the most SimilatItems available. Works fine. Now, scale
> in action ! testing with a set of 200,000 items, computing the most similar
> items for 1 item takes around 20 secs.
> My approach is to pre-compute most similar for all the 200,000 items.
> I am not looking at Hadoop for now (2000 item base currently). I know I can
> reduce my data size for similarity computation.
> What are my options ?

View raw message