mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: RecommenderJob Hadoop execution times
Date Tue, 29 May 2012 12:23:55 GMT
I am almost certain it is the combiner phase. The mappers are locally
"compacting" the output so much less must be sent to the reducer. You can
often speed it up by increasing io.sort.factor (merge more ways) and
io.sort.mb (give more space for merging in memory).

On Tue, May 29, 2012 at 12:27 PM, Nikolaos Romanos Katsipoulakis <
popanik@gmail.com> wrote:

> Hey everybody.
> I am working on a recommender system that uses Hadoop for generating Item
> Similarities. Since mahout has the RecommenderJob example, I tried to run
> the recommender in my hadoop (pseudo-) cluster. I noticed that on the
> MapReduce job : CoocurencesMapper - SimilarityReducer, there is a big
> overhead (approximately 6 minutes). When the mapping ends, there is a huge
> time gap until the reducer initiates, but one CPU was fully loaded during
> that time. Why is this happening? Is there an I/O operation hidden on this
> time gap?
>
> Thank you
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message