mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <>
Subject Re: Need to reduce execution time of RowSimilarityJob
Date Tue, 18 Sep 2012 12:57:42 GMT
You don't need to develop an in-memory implementation, we already have that.

Simply use a GenericItemBasedRecommender and ask it for the most similar
items of each item.

On 18.09.2012 14:49, yamo93 wrote:
> Hi,
> I have 30.000 items and the computation takes more than 2h on a
> pseudo-cluster, which is too long in my case.
> I think of some ways to reduce the execution time of RowSimilarityJob
> and I wonder if some of you have implemented them and how, or explored
> other ways.
> 1. tune the JVM
> 2. developing an in memory implementation (i.e. without hadoop)
> 3. reduce the size of the matrix (by removing those which have no words
> in common, for example)
> 4. run on real hadoop cluster with several nodes (does anyone have an
> idea of ​​the number of nodes to make it interesting)
> Thanks for your help,
> Yann.

View raw message