mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yamo93 <yam...@gmail.com>
Subject Re: Need to reduce execution time of RowSimilarityJob
Date Thu, 20 Sep 2012 08:28:17 GMT
Hello,

If document is second, the result will be recommendations of document to 
terms, isn't it ?
My need is to find most similar documents (i.e. recommend document to 
documents, no ?)

Thx for your help,

On 09/18/2012 04:03 PM, Sebastian Schelter wrote:
> Another error of mine, sorry, documents need to be second :)
On 09/18/2012 03:58 PM, yamo93 wrote:
> If document is in first place, should i use user based recommender 
> instead of item based ?
>
> On 09/18/2012 03:21 PM, Sebastian Schelter wrote:
>> Oh I overlooked that, sorry. You could give it (document,term,tfidf)
>> pairs instead. If you find it awkward to use a recommender to compute
>> document similarities, than maybe it would be better to think a about a
>> custom in-memory implementation.
>>
>>
>> On 18.09.2012 15:13, yamo93 wrote:
>>> Thanks,
>>>
>>> I need some explanations :
>>> GenericItemBasedRecommender needs a FileDataModel with userId, itemId,
>>> score.
>>> But i have some text documents and today i use seq2sparse and after
>>> rowid + rowsimilarity.
>>> How to call GenericItemBasedRecommender with sparse vectors ?
>>>
>>> Y.
>>>
>>> On 09/18/2012 02:57 PM, Sebastian Schelter wrote:
>>>> You don't need to develop an in-memory implementation, we already have
>>>> that.
>>>>
>>>> Simply use a GenericItemBasedRecommender and ask it for the most 
>>>> similar
>>>> items of each item.
>>>>
>>>>
>>>> On 18.09.2012 14:49, yamo93 wrote:
>>>>> Hi,
>>>>>
>>>>> I have 30.000 items and the computation takes more than 2h on a
>>>>> pseudo-cluster, which is too long in my case.
>>>>>
>>>>> I think of some ways to reduce the execution time of RowSimilarityJob
>>>>> and I wonder if some of you have implemented them and how, or 
>>>>> explored
>>>>> other ways.
>>>>> 1. tune the JVM
>>>>> 2. developing an in memory implementation (i.e. without hadoop)
>>>>> 3. reduce the size of the matrix (by removing those which have no 
>>>>> words
>>>>> in common, for example)
>>>>> 4. run on real hadoop cluster with several nodes (does anyone have an
>>>>> idea of ​​the number of nodes to make it interesting)
>>>>>
>>>>> Thanks for your help,
>>>>> Yann.
>


Mime
View raw message