mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <>
Subject Re: RecommenderJob Hadoop execution times
Date Wed, 30 May 2012 10:28:53 GMT
We have a specialized job called ItemSimilarityJob which only computes
the item similarities.


On 30.05.2012 12:25, Nikolaos Romanos Katsipoulakis wrote:
> On 05/29/2012 03:23 PM, Sean Owen wrote:
>> I am almost certain it is the combiner phase. The mappers are locally
>> "compacting" the output so much less must be sent to the reducer. You can
>> often speed it up by increasing io.sort.factor (merge more ways) and
>> io.sort.mb (give more space for merging in memory).
>> On Tue, May 29, 2012 at 12:27 PM, Nikolaos Romanos Katsipoulakis<
>>>  wrote:
>>> Hey everybody.
>>> I am working on a recommender system that uses Hadoop for generating
>>> Item
>>> Similarities. Since mahout has the RecommenderJob example, I tried to
>>> run
>>> the recommender in my hadoop (pseudo-) cluster. I noticed that on the
>>> MapReduce job : CoocurencesMapper - SimilarityReducer, there is a big
>>> overhead (approximately 6 minutes). When the mapping ends, there is a
>>> huge
>>> time gap until the reducer initiates, but one CPU was fully loaded
>>> during
>>> that time. Why is this happening? Is there an I/O operation hidden on
>>> this
>>> time gap?
>>> Thank you
> I tried to change the code from the RecommenderJob setIOSort method, but
> I noticed no change in the execution times. Maybe i will have to change
> the heap size from the hadoop configuration file. Also, I would like to
> ask you until which map-reduce job are the similarities computed? I need
> to get the similarities for my application but not the recommendations.
> Thus, I would like to exclude any computation that relates to the
> recommendation.
> Thank you

View raw message