mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yamo93 <yam...@gmail.com>
Subject Re: Need to reduce execution time of RowSimilarityJob
Date Tue, 18 Sep 2012 13:58:57 GMT
If document is in first place, should i use user based recommender 
instead of item based ?

On 09/18/2012 03:21 PM, Sebastian Schelter wrote:
> Oh I overlooked that, sorry. You could give it (document,term,tfidf)
> pairs instead. If you find it awkward to use a recommender to compute
> document similarities, than maybe it would be better to think a about a
> custom in-memory implementation.
>
>
> On 18.09.2012 15:13, yamo93 wrote:
>> Thanks,
>>
>> I need some explanations :
>> GenericItemBasedRecommender needs a FileDataModel with userId, itemId,
>> score.
>> But i have some text documents and today i use seq2sparse and after
>> rowid + rowsimilarity.
>> How to call GenericItemBasedRecommender with sparse vectors ?
>>
>> Y.
>>
>> On 09/18/2012 02:57 PM, Sebastian Schelter wrote:
>>> You don't need to develop an in-memory implementation, we already have
>>> that.
>>>
>>> Simply use a GenericItemBasedRecommender and ask it for the most similar
>>> items of each item.
>>>
>>>
>>> On 18.09.2012 14:49, yamo93 wrote:
>>>> Hi,
>>>>
>>>> I have 30.000 items and the computation takes more than 2h on a
>>>> pseudo-cluster, which is too long in my case.
>>>>
>>>> I think of some ways to reduce the execution time of RowSimilarityJob
>>>> and I wonder if some of you have implemented them and how, or explored
>>>> other ways.
>>>> 1. tune the JVM
>>>> 2. developing an in memory implementation (i.e. without hadoop)
>>>> 3. reduce the size of the matrix (by removing those which have no words
>>>> in common, for example)
>>>> 4. run on real hadoop cluster with several nodes (does anyone have an
>>>> idea of ​​the number of nodes to make it interesting)
>>>>
>>>> Thanks for your help,
>>>> Yann.


Mime
View raw message