mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yamo93 <yam...@gmail.com>
Subject Re: Need to reduce execution time of RowSimilarityJob
Date Tue, 18 Sep 2012 13:13:16 GMT
Thanks,

I need some explanations :
GenericItemBasedRecommender needs a FileDataModel with userId, itemId, 
score.
But i have some text documents and today i use seq2sparse and after 
rowid + rowsimilarity.
How to call GenericItemBasedRecommender with sparse vectors ?

Y.

On 09/18/2012 02:57 PM, Sebastian Schelter wrote:
> You don't need to develop an in-memory implementation, we already have that.
>
> Simply use a GenericItemBasedRecommender and ask it for the most similar
> items of each item.
>
>
> On 18.09.2012 14:49, yamo93 wrote:
>> Hi,
>>
>> I have 30.000 items and the computation takes more than 2h on a
>> pseudo-cluster, which is too long in my case.
>>
>> I think of some ways to reduce the execution time of RowSimilarityJob
>> and I wonder if some of you have implemented them and how, or explored
>> other ways.
>> 1. tune the JVM
>> 2. developing an in memory implementation (i.e. without hadoop)
>> 3. reduce the size of the matrix (by removing those which have no words
>> in common, for example)
>> 4. run on real hadoop cluster with several nodes (does anyone have an
>> idea of ​​the number of nodes to make it interesting)
>>
>> Thanks for your help,
>> Yann.


Mime
View raw message