mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Mahout performance issues
Date Thu, 01 Dec 2011 15:11:22 GMT
'Cold start' in collaborative filtering lingo refers to the fact that
you need to see some interactions for an item first before you can use
in the recommendation computation. Let's better not use it as a
technical term to avoid confusion :)

--sebastian

On 01.12.2011 16:06, Daniel Zohar wrote:
> Hi Manuel,
> I haven't got to the point where CacheItemSimilarity kicks in. That is, I
> will have to run a lot of recommendations in order to get a real benefit
> from it. I would first like to optimize the 'cold start' so it's at least
> serves at reasonable time. Usually cache is used to prevent repeated
> calculations, but personally I dont think it's a replacement for optimized
> performance. Don't you agree?
> 
> Also, I will try to profile the app now as you suggest and send the results
> asap.
> 
> Thanks!
> 
> On Thu, Dec 1, 2011 at 4:56 PM, Manuel Blechschmidt <
> Manuel.Blechschmidt@gmx.de> wrote:
> 
>> Hi Daniel,
>> actually you are running the profile inside tomcat. You should take a
>> snapshot and then drill down to the functions where the actual
>> recommendation takes place. The current screenshots also contains some
>> profiles from Tomcat threads which are sleeping a lot and therefore taking
>> a lot of time.
>>
>> Further the screenshots does not contain the amount how often the
>> different functions are called.
>>
>> You have to profile multiple requests alone. The CacheItemSimilarity gets
>> filled therefore it should go faster and faster.
>>
>> On 01.12.2011, at 15:11, Daniel Zohar wrote:
>>
>>> @Manuel thanks for the tips. I have installed VisualVM and followed are
>> the
>>> results
>>> I did two sampling -
>>> - With the optimized SamplingCandidateItemsStrategy (
>>> http://pastebin.com/6n9C8Pw1): http://static.inky.ws/image/934/image.jpg
>>> - Without the optimized SamplingCandidateItemsStrategy:
>>> http://static.inky.ws/image/935/image.jpg
>>>
>>
>> The big hot spot is the function FastIDSet.find():
>>
>> Optimized: 13,759 s
>> Unoptimized: 246,487 s
>>
>> So you see that your optimization already got you a performance boost of
>> 2000%.
>>
>> Did you play around with the CacheItemSimilarity cache sizes?
>>
>> /Manuel
>>
>> --
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>
> 


Mime
View raw message