mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yamo93 <>
Subject Need to reduce execution time of RowSimilarityJob
Date Tue, 18 Sep 2012 12:49:31 GMT

I have 30.000 items and the computation takes more than 2h on a 
pseudo-cluster, which is too long in my case.

I think of some ways to reduce the execution time of RowSimilarityJob 
and I wonder if some of you have implemented them and how, or explored 
other ways.
1. tune the JVM
2. developing an in memory implementation (i.e. without hadoop)
3. reduce the size of the matrix (by removing those which have no words 
in common, for example)
4. run on real hadoop cluster with several nodes (does anyone have an 
idea of ​​the number of nodes to make it interesting)

Thanks for your help,

View raw message